How to Use the Gemma Model for Text Generation

Aug 2, 2024 | Educational

Welcome to the world of Gemma—a cutting-edge, lightweight text generation model created by Google! Whether you’re developing chatbots, generating content, or exploring new research avenues, Gemma can be your go-to tool. In this guide, we’ll walk you through the steps to get started with Gemma on your local machine or your cloud setup. Let’s dive in!

Getting Started

Before you can use Gemma, you need to install the necessary transformers library. Run the following command:

pip install -U transformers accelerate bitsandbytes

Running Gemma on a Single/Multi GPU

Imagine cooking a complex dish. You want all your ingredients to be accessible and your kitchen well-organized to create magic. Similarly, setting up Gemma requires a few steps to ensure everything is “cooked” perfectly:

Ingredients (Packages):
- transformers: For utilizing the Gemma model.
- accelerate: To manage acceleration packages.
- bitsandbytes: For quantization aspects.
Recipe (Code):
Let’s break down the provided code in an analogy:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Think of tokenizer as your knife which chops the ingredients (text) into digestible pieces
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
# The model is your cooking apparatus that takes these chopped ingredients (tokens) and cooks (processes) them into a dish (output)
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

input_text = "Write me a poem about Machine Learning."
# Your chopped (tokenized) input now ready to be put into the cooker (model)
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
# Your final dish is ready
print(tokenizer.decode(outputs[0]))

This script will load the model, tokenize the input, generate the text, and finally decode the generated tokens into a readable format.

Running the Model on Different Precisions

Gemma can run on different types of “fuels” to optimize performance. You can use bfloat16 (lighter and faster) or float32 (default and wider).

Using bfloat16 Precision:
Already shown in the example above.
Using float32 Precision:
Omit the torch_dtype to use float32.

Quantized Versions for Resource Optimization

When you need to conserve power and run the model efficiently on lower-end hardware, you can use quantization.

8-bit Precision (int8):

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

    quantization_config = BitsAndBytesConfig(load_in_8bit=True)
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
    model = AutoModelForCausalLM.from_pretrained(
        "google/gemma-2-9b-it",
        quantization_config=quantization_config
    )

    input_text = "Write me a poem about Machine Learning."
    input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

    outputs = model.generate(**input_ids)
    print(tokenizer.decode(outputs[0]))

2. 4-bit Precision :

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

    quantization_config = BitsAndBytesConfig(load_in_4bit=True)
    tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
    model = AutoModelForCausalLM.from_pretrained(
        "google/gemma-2-9b-it",
        quantization_config=quantization_config
    )

    input_text = "Write me a poem about Machine Learning."
    input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

    outputs = model.generate(**input_ids)
    print(tokenizer.decode(outputs[0]))

For more troubleshooting questions/issues, contact our fxis.ai data scientist expert team.

Conclusion

Gemma offers a versatile and powerful tool for various text generation applications. By following the steps outlined above, you can run Gemma efficiently on your setup and create amazing text-based outputs.

Happy Coding!!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox