How to Get Started with Qwen1.5-14B-Chat-AWQ

May 2, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_9_173

Welcome to your journey of exploring the powerful Qwen1.5 language model! In this guide, we will walk you through the steps to set up and utilize Qwen1.5 for generating text. Let’s dive into the exciting world of AI language generation!

What is Qwen1.5?

Qwen1.5 is a beta version of the Qwen2 series, designed as a transformer-based decoder-only language model. It boasts significant enhancements over its predecessor, including:

Multiple model sizes: Ranging from 0.5B to a whopping 72B.
Multilingual support for both base and chat models.
Stable 32K context length across all model sizes.
No requirement for trust_remote_code.

For a deeper insight, you can refer to our blog post and GitHub repo.

Requirements

Before you start, make sure you have the required packages installed. You must have transformers version 4.37.0 to avoid errors like KeyError: qwen2.

Quickstart Guide

Let’s get moving with a practical code snippet that demonstrates loading the tokenizer and model, as well as generating content. Think of this process like setting up a stage for a fantastic play where the model is the main actor:

Imagine preparing a beautiful theater where everything must be just right before the performance:

The stage is your execution environment (make sure you have CUDA enabled).
Your actor is the Qwen1.5 model, which you will load onto the stage.
The script is the prompt you give to the model, guiding what it should say.

Here’s how to set everything up:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = cuda  # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    'Qwen/Qwen1.5-14B-Chat-AWQ',
    torch_dtype='auto',
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen1.5-14B-Chat-AWQ')

prompt = "Give me a short introduction to large language models."
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors='pt').to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Tips for Success

If you encounter any issues like code switching or unexpected outputs, don’t worry! Here are some tips to help you troubleshoot:

Use the provided hyper-parameters in generation_config.json for better results.
Double-check that you are using the specified transformers version.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Happy coding and exploring with Qwen1.5!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox