How to Utilize SmolLM for Your AI Projects

Aug 3, 2024 | Educational

In the burgeoning realm of artificial intelligence, SmolLM emerges as a significant player with its state-of-the-art small language models. Designed for various computational endeavors, these models provide a robust framework for generating text efficiently. This blog will guide you through installing, running, and understanding SmolLM’s capabilities, while addressing common issues you may encounter along the way.

Table of Contents

Model Summary

SmolLM offers a series of small language models available in three distinct sizes: 135M, 360M, and 1.7B parameters. These models are built on the meticulously curated Cosmo-Corpus, a training dataset that includes:

  • Cosmopedia v2: 28B tokens of synthetic textbooks and stories generated by Mixtral.
  • Python-Edu: 4B tokens of educational Python samples from The Stack.
  • FineWeb-Edu: 220B tokens of deduplicated educational web samples from FineWeb.

When tested across various benchmarks for common sense reasoning and world knowledge, SmolLM models demonstrate exceptional performance in comparison to other models within their size categories. For further details, refer to our full blog post.

How to Get Started with SmolLM

Let’s dive into the process of installing and running the SmolLM model on your local machine:

1. Installation

pip install transformers

2. Running the Model

The following steps will guide you in running the SmolLM-360M model:

CPU/GPU Setup

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "HuggingFaceTB/SmolLM-360M"
device = "cuda" # Use "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Using Bfloat16 Precision

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "HuggingFaceTB/SmolLM-360M"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Quantized Versions Using `bitsandbytes`

For runs that prioritize memory efficiency, you can use 8-bit precision:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
checkpoint = "HuggingFaceTB/SmolLM-360M"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=quantization_config)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Limitations

While SmolLM has been trained on diverse datasets, it does come with a few limitations:

  • The models primarily understand and generate content in English.
  • Generated content may not always be factually accurate or logically consistent.
  • Be aware of biases present in the training data; use the models as assistive tools rather than definitive sources.

For a detailed discussion about the model’s capabilities, refer to our full blog post.

Training

Curious about the training of SmolLM? Here’s a quick overview:

License

This work is licensed under the Apache 2.0 license.

Citation

@misc{allal2024SmolLM,
title={SmolLM - blazingly fast and remarkably powerful},
author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Leandro von Werra and Thomas Wolf},
year={2024},
}

Troubleshooting

If you encounter issues while running SmolLM, consider the following troubleshooting steps:

  • Verify that you have the correct version of Python and all required libraries installed.
  • Check for any typos in your code, especially when importing libraries or defining variables.
  • For memory issues while using larger models, consider switching to quantized versions or using multiple GPUs.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox