How to Use Qwen1.5-14B-Chat-AWQ: A Beginner’s Guide

May 4, 2024 | Educational

Welcome to the future of language models with Qwen1.5-14B-Chat-AWQ! This revolutionary transformer-based decoder-only model is designed to enhance your text generation capabilities. In this guide, we’ll walk you through how to set up and use the Qwen1.5 model effectively and address any common issues you might encounter along the way.

Introduction to Qwen1.5

Qwen1.5 is the beta version of Qwen2, boasting impressive features such as:

Multiple model sizes: 0.5B, 1.8B, 4B, 7B, 14B, 32B, and 72B dense models, along with an MoE model of 14B with 2.7B activated.
Enhanced performance in human preference for chat models.
Support for multiple languages across both base and chat models.
Stable context length of 32K for all model sizes.
No need for trust_remote_code.

For more details, check out our blog post and GitHub repository.

Model Details

The Qwen1.5 series includes different sized decoder language models, designed utilizing advanced techniques such as SwiGLU activation and mixture of sliding window and full attention. It also features a tokenizer that adapts to various languages and coding standards. Despite current limitations regarding GQA, the beta version remains robust.

Training Details

These models were pre-trained on extensive data sets and post-trained with supervised fine-tuning and direct preference optimization, ensuring they perform remarkably for their intended tasks.

Quickstart Guide

Now that you’re familiar with the basics, let’s look at a quick code snippet to get you started with generating text using the model:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda'  # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    'Qwen/Qwen1.5-14B-Chat-AWQ',
    torch_dtype='auto',
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen1.5-14B-Chat-AWQ')

prompt = 'Give me a short introduction to large language model.'
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors='pt').to(device)
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Think of the above code as following a recipe in a cookbook. Each step builds upon the last to create a deliciously functional language model. You start by gathering your ingredients (importing libraries), prepare your cooking space (setting the device), and then follow the instructions (loading the model and tokenizer). Finally, you serve up your output (generated text) just as you would serve a tasty dish!

Troubleshooting

If you encounter the following error:

KeyError: qwen2

This might indicate that you’re using an outdated version of the transformers library. Ensure you have version 4.37.0 or higher installed to avoid compatibility issues. Additionally, if you’re experiencing code-switching or other unexpected behaviors in your generated text, consider using the recommended hyper-parameters in the generation_config.json.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With these steps, you’re well-equipped to dive into the world of Qwen1.5-14B-Chat-AWQ! At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Happy coding!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox