How to Get Started with Qwen1.5-32B-Chat

May 4, 2024 | Educational

In this article, we’ll explore the amazing capabilities of Qwen1.5, the beta version of Qwen2, a transformer-based decoder-only language model. As a user-friendly introduction, I’ll guide you through the setup and provide troubleshooting tips along the way.

What is Qwen1.5?

Qwen1.5 shines as a robust language model series, with various sizes ranging from 0.5B to 72B. It enhances performance across multiple languages and contexts, making it a game-changer for chat models. Here are some of the major improvements compared to its predecessor:

8 model sizes available, including both base and chat variants.
Significant improvements in human preference for chat interaction.
Full multilingual support for base and chat models.
Stability in handling 32K context length for all model sizes.
No need for trust in remote code.

For more in-depth information, you can visit our blog post and GitHub repository.

Getting the Model Ready: Requirements

Before you can start using Qwen1.5, you’ll need to have the Hugging Face transformers installed, specifically version 4.37.0. This is crucial to avoid errors like KeyError: qwen2.

Quickstart Guide: Load the Model and Tokenizer

To make it simple, let’s consider the process of loading your model akin to preparing a gourmet dish—having all the right ingredients and tools is essential.

Here’s a Python code snippet that demonstrates how to load the model and tokenizer, much like blending our ingredients together:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda'  # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    'Qwen/Qwen1.5-32B-Chat', 
    torch_dtype='auto', 
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen1.5-32B-Chat')

prompt = "Give me a short introduction to large language model."
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors='pt').to(device)
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

In this analogy, you’ve prepared a scrumptious dish to share with friends: loading the model and tokenizer is like preparing your ingredients, and generating a response is akin to serving the dish. Each step is vital to ensure a flavor that leaves a lasting impression.

Troubleshooting: Common Issues

While everything should ideally go smoothly, sometimes you might run into a few bumps along the road. Here are some common troubleshooting tips:

If you encounter unexpected behaviors like code switching, consider adjusting hyper-parameters in the generation_config.json file.
Make sure your environment is set up for GPU if you’re using large models, as they demand significant computational power.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox