Getting Started with Qwen1.5-110B-Chat-AWQ

May 4, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_0_236

Welcome to the world of advanced language models! In this article, we will guide you through the process of setting up and using the Qwen1.5-110B-Chat-AWQ model. With its transformer-based architecture and impressive enhancements, Qwen1.5 is designed to facilitate efficient text generation and provide exceptional performance in chat applications.

Introduction

Qwen1.5, a stepping stone to Qwen2, is a beta version that has been pretrained on an extensive corpus of data. Here are some notable features of Qwen1.5:

Models available in sizes: 0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B, and 110B, plus an MoE model.
Enhanced human preference alignment for chat functionalities.
Multilingual support, catering to diverse languages.
Stable support for a 32K context length across all model sizes.
No reliance on trust_remote_code.

For additional details, please refer to our blog post and GitHub repo.

Model Details

The Qwen1.5 model series features various decoder language models. Each size includes both a base language model and an aligned chat model, leveraging the Transformer architecture with advanced functionalities like SwiGLU activation and optimized attention mechanisms. This model has improved tokenizers adaptable to multiple languages, allowing for versatile applications.

Training Details

Qwen1.5 is pretrained with a significant dataset and enhanced via supervised finetuning and direct preference optimization techniques, ensuring it’s primed for effective use.

Requirements

Before you start using Qwen1.5, make sure you have the latest Hugging Face transformers library installed. We recommend version 4.37.0. If you attempt to run the code with an outdated version, you might encounter the following error:

KeyError: qwen2

Quickstart Guide

Follow the steps below to load the tokenizer and model and to generate content:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = cuda  # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-110B-Chat-AWQ",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-110B-Chat-AWQ")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

In this code snippet, imagine you are preparing a delicious recipe. You begin by gathering all your ingredients (loading the model and tokenizer). Next, you mix them together (applying the chat template). Finally, after letting it cook (processing), you retrieve the final dish (generated response) that satisfies your hunger for knowledge!

Troubleshooting

If you encounter issues such as code switching or unexpected behavior during text generation, consider using the hyper-parameters provided in the generation_config.json file. This can help optimize the model’s performance.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox