How to Use Qwen1.5-7B-Chat-GPTQ-Int8: A Step-by-Step Guide

May 4, 2024 | Educational

Welcome to the exciting world of AI language models! Today, we’re diving into the workings of Qwen1.5-7B-Chat-GPTQ-Int8, a fascinating transformer-based decoder-only model designed for text generation. This model is not only large but is also equipped with improved functionalities and multilingual support. Let’s explore how you can quickly get started with it.

Introduction

Qwen1.5 marks an evolutionary leap from its predecessor Qwen, showcasing:

Multiple model sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B.
Enhanced performance in chat model interactions.
Rich multilingual capabilities in both base and chat models.
Stable 32K context length across model sizes.
No reliance on remote code trust.

For more information, please explore our blog post and check the GitHub repo.

Setting Up Qwen1.5

Model Details

Qwen1.5 represents a series of decoder language models of varying sizes. Each size has a base model and an aligned chat model utilizing:

Transformer architecture with SwiGLU activation.
Attention mechanisms including QKV bias and group query attention.
An advanced tokenizer for multiple languages and code adaptability.

For this beta version, certain features like GQA and mixed attention techniques have been temporarily excluded.

Training Insights

The training process involved an extensive dataset, followed by supervised fine-tuning and direct preference optimization (DPO). DPO enhances human preference evaluation but may lead to poorer benchmark results. Improvements for these will be up-and-coming!

Quickstart Guide

Follow these steps to load the model and generate content:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda'  # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    'Qwen/Qwen1.5-7B-Chat-GPTQ-Int8',
    torch_dtype='auto',
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen1.5-7B-Chat-GPTQ-Int8')

prompt = "Give me a short introduction to large language models."
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors='pt').to(device)
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Code Explanation: The Gardener’s Analogy

Think of working with the Qwen1.5 model as tending to a garden:

Loading the Model: This is akin to deciding to plant seeds. You choose the right type of seed (your model size, e.g., 7B) and prepare your garden (setup the environment with transformers).
Creating Your Prompt: Just like watering plants with a specific nutrient mix, you give the model a prompt to nurture the response it will generate.
Generating Outputs: Finally, when you harvest the fruits of your labor, you extract the generated text from the model, just as you would pick ripe tomatoes from your flourishing garden.

Troubleshooting Tips

While working with Qwen1.5, you might encounter a few common issues:

If you see a KeyError: qwen2, ensure you’ve correctly installed transformers=4.37.0 as this is a known requirement.
If you’re affected by code switching or undesirable outputs, consider adjusting hyper-parameters in generation_config.json.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

With Qwen1.5, you’re well-equipped to harness the power of large language models. Happy coding!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox