How to Use SmolLM: A Guide for Developers

Jul 20, 2024 | Educational

If you’re looking to explore state-of-the-art natural language processing (NLP) with a touch of creativity, SmolLM might just be the perfect choice for you! This small language model series has been crafted with care to offer impressive performance. In this guide, we will walk you through how to set it up, use it effectively, and troubleshoot common issues.

Model Summary

Imagine SmolLM as your trustworthy assistant who carries a library of knowledge in small, manageable books. Each book represents a size of SmolLM—135M, 360M, and 1.7B parameters. This assistant has been trained on a fine selection of high-quality texts, including synthetic textbooks, educational Python samples, and various web samples. For comparison, think of the models as different categories of restaurants: each offers unique flavors, but all serve educational content to satisfy your coding cravings.

The models excel in common sense reasoning and world knowledge, making them reliable for various tasks. If you need more detailed information on their performance and benchmarks, be sure to check our full blog post on SmolLM.

Setting Up SmolLM

Before you dive in, let’s cover how to start using SmolLM. You’ll need to install the `transformers` library, which allows you to easily obtain the SmolLM models.

Install Transformers


pip install transformers

Running SmolLM on Different Devices

Imagine you’re packing for a journey. Depending on the trip—whether it’s a quick local tour (CPU), a rugged mountain trek (GPU), or a full-blown expedition with a team (multiple GPUs)—you’ll pack differently. SmolLM can adapt to your device’s capabilities to maximize performance:

#### Using Full Precision (CPU/GPU):


from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "HuggingFaceTB/SmolLM-1.7B"
device = "cuda"  # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)

print(tokenizer.decode(outputs[0]))

#### Using `torch.bfloat16`:

This is akin to packing light—the model can effectively operate while using less memory:


import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "HuggingFaceTB/SmolLM-1.7B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)

print(tokenizer.decode(outputs[0]))

#### Using Quantized Versions (8-bit Precision):

Just like taking a smaller bag on a trip, you can opt for a quantized version to save resources:


from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
checkpoint = "HuggingFaceTB/SmolLM-1.7B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=quantization_config)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)

print(tokenizer.decode(outputs[0]))

Limitations

While SmolLM models are remarkable, they are not infallible. Here are some things to keep in mind:

1. Language Proficiency: They mainly operate in English. If you’re looking for multilingual capabilities, you may need to explore other options.
2. Accuracy: Generated content might not always be factually accurate or logically sound. Treat the outputs like brainstormed ideas rather than confirmed facts.
3. Biases: The models inherit biases from their training data. It’s critical to verify any crucial information produced by them.

For more in-depth discussions about SmolLM’s capabilities and limitations, feel free to check our full blog post.

Troubleshooting

Sometimes, even the best journeys hit a bump in the road. Here are some troubleshooting tips if you encounter issues while using SmolLM:

– Installation Errors: Ensure that all libraries are correctly installed with the appropriate versions.
– Device Compatibility: Double-check that your device supports the required settings for GPU/CPU. If it fails to run on your chosen device, try switching between CPU and GPU.
– Performance Variability: If you notice performance discrepancies between various checkpoints, consider redownloading the model, as there may have been updates or fixes.

For more troubleshooting questions/issues, contact our fxis.ai data scientist expert team.

Conclusion

By following this guide, you’re well equipped to embark on your journey with SmolLM. Embrace the power of this small language model and unlock new coding possibilities. Happy coding!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

How to Use SmolLM: A Guide for Developers

Let’s Build Success Together