How to Use QuantFactory with Transformers

Oct 28, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesQuantFactory_mamba-2.8b-hf-GGUF

In the fast-paced realm of AI, the ability to implement advanced models like QuantFactory can be a game changer. This guide will help you understand how to effectively utilize the quantized model, mamba-2.8b-hf, in your projects. From installation to fine-tuning the model, we will cover everything you need to get started!

What is QuantFactory?

QuantFactory is a repository housing the mamba-2.8b model, designed for transformer compatibility. This quantized version is optimized for performance, and while the checkpoints remain untouched, both the full config.json and tokenizer are made available for seamless integration into your development workflow.

Installation Steps

To start off, you need to install the required packages. Please follow the steps below:

Open your command line interface.
Install the main transformers library. Ensure that you install it from the main branch until version 4.39.0 is released:

pip install git+https://github.com/huggingface/transformers@main

Additionally, install causal_conv_1d and mamba-ssm:

pip install causal-conv1d=1.2.0
pip install mamba-ssm

Generating Text with the Model

Once the installation is complete, you can start generating text. The process can be compared to a chef preparing a signature dish, using specific ingredients (input) to create a delightful meal (output). Here’s how to do it:

Import the necessary libraries and modules:

from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch

Load your tokenizer and model:

tokenizer = AutoTokenizer.from_pretrained('state-spaces/mamba-2.8b-hf')
model = MambaForCausalLM.from_pretrained('state-spaces/mamba-2.8b-hf')

Create input IDs from your prompt:

input_ids = tokenizer("Hey how are you doing?", return_tensors='pt')['input_ids']

Generate responses:

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

This will yield a response similar to: “I’m doing great.”

Fine-tuning with PEFT

If you’re looking to customize your model, fine-tuning it using the PEFT library is the way to go. Think of it as giving your dish a special secret ingredient to enhance the flavor. Use the following example to get started:

from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments

tokenizer = AutoTokenizer.from_pretrained('state-spaces/mamba-2.8b-hf')
model = AutoModelForCausalLM.from_pretrained('state-spaces/mamba-2.8b-hf')
dataset = load_dataset("Abirate/english_quotes", split='train')

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    logging_dir='./logs',
    logging_steps=10,
    learning_rate=2e-3
)

lora_config = LoraConfig(
    r=8,
    target_modules=['x_proj', 'embeddings', 'in_proj', 'out_proj'],
    task_type='CAUSAL_LM',
    bias='none'
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    peft_config=lora_config,
    train_dataset=dataset,
    dataset_text_field='quote',
)

trainer.train()

Once you have your setup complete, simply run the script and watch the magic unfold!

Troubleshooting

As with any technology, issues may arise during installation or usage. Here are some common troubleshooting tips:

Missing Installations: If you encounter errors regarding missing modules, ensure that you have installed all the necessary packages as outlined in the installation steps.
Model Not Responding: For any issues relating to the model not generating text, check that the input provided is correctly formatted and within the acceptable token limits.
CUDA Performance: If you want to leverage GPU optimization, ensure that both causal_conv_1d and mamba-ssm are installed. Otherwise, the model will revert to eager execution.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox