Welcome to your go-to guide for harnessing the power of SmolLM, a state-of-the-art language model designed for those who wish to integrate machine learning into their projects. In this article, we’ll walk you through the essentials of setting up and running SmolLM, whilst ensuring you have the tools to troubleshoot any bumps along the way.
Table of Contents
1. [Model Summary](#model-summary)
2. [Limitations](#limitations)
3. [Training](#training)
4. [License](#license)
5. [Troubleshooting](#troubleshooting)
Model Summary
SmolLM is like a box of chocolates, offering three distinct sizes ranging from 135M to 1.7B parameters. Think of it as a choice of snacks while you code — whether you’re just snacking lightly (135M), indulging a bit more (360M), or in for the big feast (1.7B), there’s an option for every appetite.
These models are built on the Cosmo-Corpus, a finely crafted dataset that includes 28B tokens from synthetic textbooks, educational Python samples, and a vast array of web samples. SmolLM models shine in benchmarks for common sense reasoning and world knowledge, making them competitive peers in the realm of small language models.
For a deeper dive into performance and benchmarks, feel free to check our blog post!
Get Started with SmolLM
To get started, simply run the following command in your terminal:
pip install transformers
Running the Model
Bindings for running the model on either CPU or GPU are as easy as pie! Here’s a sweet analogy: think of your model as a pastry chef that can whip up delicious responses. If you want your chef (model) to work faster, you might want to let them use an industrial oven (GPU).
#### Example: Using the Model on GPU
Here’s the process of invoking our pastry chef to make a cake for us (generate text):
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM-135M"
device = "cuda" # "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
In the code above, we first load the ingredients (model and tokenizer), then prepare the input (our cake recipe), and finally bake it (generate the text)!
The memory footprint of this operation can be monitored using:
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Limitations
Just like every dessert has its secret ingredient that may not be perfect for everyone, SmolLM too has limitations. While it excels in generating content mainly in English and on various topics, it’s not immune to inaccuracies or biases that might creep in from the training data. Use it as a helpful assistant, but remember to verify important information independently.
Training
Let’s take a quick look at the training setup of SmolLM:
– Architecture: [More details available here](https://huggingface.co/blog/smollm).
– Pretraining Steps: 600k.
– Pretraining Tokens: 600B.
– Precision: bfloat16.
– Hardware: 64 GPUs (the industrial ovens of machine learning!).
– Software: Built on the Nanotron framework.
License
SmolLM comes wrapped in an [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), which is quite permissive and fosters collaboration.
Troubleshooting
If you encounter issues while working with SmolLM, here are some handy troubleshooting tips:
Common Issues
– Installation Errors: Ensure you have the latest version of `transformers` installed.
– Memory Errors: Check if you’re using the right precision and device (CPU/GPU) settings.
– Model Loading Issues: Make sure that the model checkpoint name is correct and that you have access to it.
For more troubleshooting questions/issues, contact our fxis.ai data scientist expert team.
With this guide in hand, you should now feel empowered to explore the world of SmolLM and uncover its potential in your projects. Happy coding!

