How to Get Started with Qwen1.5-110B-Chat Model

May 4, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_9_234

Welcome to your guide on using the cutting-edge Qwen1.5-110B-Chat model! This article walks you through everything you need to know, from installation to quickstarts, ensuring you have a smooth experience with this impressive language model. So, let’s dive in!

Introduction to Qwen1.5

Qwen1.5 represents the beta release of Qwen2, a transformer-based decoder-only language model that has undergone extensive training on a vast dataset. With nine model sizes, significant performance enhancements, and multilingual support, it’s designed for versatile usage. Key improvements include:

Multiple model sizes: 0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B, and 110B.
Improved performance in human preference for chat models.
Stable support for 32K context length across models.
No need for trust_remote_code.

For an in-depth understanding, feel free to check out the official blog post.

Model Details

Qwen1.5 is composed of various decoder language models that are based on the robust Transformer architecture. Features include:

SwiGLU activation for better performance.
Attention QKV bias and group query attention for efficient processing.
A multi-language adaptive tokenizer for advanced natural language processing.

Training Details

The Qwen1.5 models are pretrained on extensive datasets and post-trained through supervised fine-tuning and direct preference optimization to enhance their capabilities.

Requirements

To work with the Qwen1.5 model, ensure you have the latest version of Hugging Face transformers installed. Specifically, use:

pip install transformers==4.37.0

Without this specific version, you might encounter errors such as KeyError: qwen2.

Quickstart Guide

Let’s demonstrate how to load the tokenizer and model and generate content with a simple code snippet.

python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda' # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    'Qwen/Qwen1.5-110B-Chat',
    torch_dtype='auto',
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen1.5-110B-Chat')

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors='pt').to(device)
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

This snippet efficiently loads the model, processes a prompt, and generates a response. Think of the process as orchestrating a well-rehearsed theater troupe—each actor (or model part) follows a script (the code) to produce a fantastic show (the output).

Troubleshooting

If you run into issues while using the Qwen1.5 model, consider the following:

Ensure you have the correct Hugging Face transformers library version.
If experiencing issues like code switching, utilize the hyper-parameters provided in generation_config.json.
If the model isn’t generating expected responses, verify your prompt structure and input data.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox