How to Get Started with Qwen1.5-7B-Chat-GPTQ-Int8: A Comprehensive Guide

May 1, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_3_174

Are you fascinated by the advancements in language models? Want to explore the capabilities of Qwen1.5, the beta version of Qwen2? This article will guide you through the process of utilizing the Qwen1.5 language model, making it user-friendly for developers, researchers, and AI enthusiasts. Let’s dive in!

Introduction to Qwen1.5

Qwen1.5 is a state-of-the-art transformer-based language model boasting multiple enhancements compared to its predecessor. Here are some key features:

Six model sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B.
Significant performance improvements for chat models based on human preferences.
Support for multiple languages in both base and chat versions.
Stable 32K context length for models of all sizes.
No need for trust_remote_code during usage.

For more details, please refer to our blog post and GitHub repo.

Model Details

The Qwen1.5 series contains various decoder language models across different sizes. Each size comes with both a base language model and a chat-aligned variant. These models utilize the powerful Transformer architecture, featuring:

SwiGLU activation
Attention QKV bias
Group query attention
Mixture of sliding window attention and full attention

Our tokenizer has been optimized to adapt across various natural languages, enhancing your data processing capabilities.

Training Details

The models underwent extensive pretraining with a vast dataset, followed by supervised fine-tuning and direct preference optimization. However, please note that while DPO improves user preference evaluation, it may lead to a drop in benchmark scores. Rest assured, we are actively working to address these issues.

Requirements

To ensure smooth functionality, it is crucial to have the latest Hugging Face transformers installed. We recommend using transformers=4.37.0. If not, you may encounter the following error:

KeyError: qwen2

Quickstart Guide

Ready to get your hands dirty? Below is a code snippet to help you load the tokenizer and model seamlessly to generate content.

python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = “cuda”  # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    “QwenQwen1.5-7B-Chat-GPTQ-Int8”,
    torch_dtype=“auto”,
    device_map=“auto”
)
tokenizer = AutoTokenizer.from_pretrained("QwenQwen1.5-7B-Chat-GPTQ-Int8")

prompt = "Give me a short introduction to large language model."
messages = [
    {“role”: “system”, “content”: “You are a helpful assistant.”},
    {“role”: “user”, “content”: prompt}
]
text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors=“pt”).to(device)

generated_ids = model.generate(
    model_inputs.input_ids, 
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] 
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Understanding the Code: The Bridge Analogy

Imagine that building a bridge connects two sides of a river—just like our code connects various parts of the model and tokenizer. Each section of the code constructs different sections of this bridge:

**Loading the Model**: Think of this as laying down the sturdy foundation of the bridge. We first declare where the model will reside—whether in the cloud (using CUDA) or locally.
**Tokenization**: Consider this the design plans that allow our bridge to hold weight. By processing input prompts, we convert human language into a format the model understands.
**Model Generation**: This is like the final build of the bridge. We allow the model to generate responses based on the structure we built in the previous stages, connecting the user to knowledge.

Troubleshooting

If you encounter any issues, such as code-switched responses or unexpected outputs, we recommend checking the hyperparameters in the generation_config.json file provided with the model. Fine-tuning these parameters can significantly improve performance.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

With this guide, you now have the knowledge and tools to embark on your journey with Qwen1.5. Enjoy exploring the incredible potential of language models!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox