Getting Started with Guanaco Models Based on LLaMA

Aug 20, 2024 | Educational

Welcome to the world of Guanaco models, an open-source advancement in chatbot technology based on LLaMA. In this guide, we’ll take you step-by-step through using Guanaco models for your research purposes, troubleshooting tips, and insights on the model’s capabilities.

What is Guanaco?

The Guanaco models are finetuned chatbots optimized through a process called 4-bit QLoRA tuning on the OASST1 dataset. They come in various sizes (7B, 13B, 33B, and 65B parameters) and are particularly competitive when benchmarked against commercial systems like ChatGPT and BARD.

Why Use Guanaco?

Cost-Effective Research: As an open-source model, it allows for local and inexpensive experimentation.
Competitive Performance: Achieves benchmark results comparable to commercial counterparts.
Extensive Use Cases: Made to be flexible and extendable for various applications.
Efficient Training: Utilizes a replicable training process with rigorous evaluations against traditional methods.

How to Load and Use Guanaco Models

To begin using the Guanaco models, you will need to follow these steps:

Ensure you have the required libraries installed: torch, peft, and transformers.
Load the model as shown below:

python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = 'huggyllama/llama-7b'
adapters_name = 'timdettmers/guanaco-7b'

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map='auto',
    max_memory={i: '24000MB' for i in range(torch.cuda.device_count())},
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4'
    ),
)

model = PeftModel.from_pretrained(model, adapters_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Think of loading a Guanaco model like preparing a space rocket for launch. You first gather all the components (libraries and model specifications), then assemble them carefully (using the loading script) to ensure that everything is set for a successful mission (your research or application). Each parameter you adjust fine-tunes the performance of your “rocket” (the model), optimizing how it responds to queries.

Performing Inference

Once the model is loaded, you can carry out inference as follows:

python
prompt = "Introduce yourself"

formatted_prompt = f"A chat between a curious human and an artificial intelligence assistant.\nThe assistant gives helpful, detailed, and polite answers to the user's questions.\n### Human: {prompt} ### Assistant:"

inputs = tokenizer(formatted_prompt, return_tensors='pt').to('cuda:0')
outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Current Inference Limitations

At present, 4-bit inference might be slower than desired, particularly for more complex queries. If speed is crucial, consider loading the model in 16-bits as shown in the documentation. Here’s how to perform that:

python
model_name = 'huggyllama/llama-7b'
adapters_name = 'timdettmers/guanaco-7b'

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map='auto',
    max_memory={i: '24000MB' for i in range(torch.cuda.device_count())},
)

model = PeftModel.from_pretrained(model, adapters_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Troubleshooting

If you encounter any issues while using Guanaco models, consider the following troubleshooting ideas:

Check if all the required libraries are installed and up to date.
Ensure that your hardware meets the necessary requirements for running large models.
Try adjusting the model loading configurations if experiencing memory-related errors.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox