How to Get Started with Qwen1.5-110B-Chat-GPTQ-Int4

May 3, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_13_235

If you’ve recently discovered Qwen1.5, the latest advancement in transformer-based language models, you’re in for an exciting journey! In this guide, we will walk you through the process of setting up Qwen1.5, highlight its key features, and provide useful troubleshooting tips.

Introduction to Qwen1.5

Qwen1.5 is the beta version of Qwen2, a language model designed to understand and generate human-like text. It boasts several enhancements over its predecessor:

Variety of model sizes: 0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B, and 110B.
Improved human preference performance for chat models.
Multilingual capabilities ready for both base and chat models.
Stable 32K context length across all model sizes.
No reliance on remote code hosting.

For additional information, check out the blog post and the GitHub repository.

Understanding the Model Details

Think of Qwen1.5 as a library, where each model size is a different section stocked with books for various needs. The base models are set up for general reading, while the aligned chat models are like knowledgeable librarians ready to assist you in finding exactly what you need. It employs advanced mechanisms such as SwiGLU activation and group query attention for efficient processing.

Training Details

The models were pretrained using substantial datasets, followed by a methodical approach of supervised fine-tuning and preference optimization to enhance performance.

Requirements for Installation

To begin working with Qwen1.5, you need to ensure that you have the correct version of the Hugging Face Transformers library installed. We recommend version 4.37.0. Failure to do so may result in errors such as:

KeyError: qwen2

Quickstart Guide: Loading the Model

To swiftly set up Qwen1.5, follow this simple code snippet:

python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda'  # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    'Qwen/Qwen1.5-110B-Chat-GPTQ-Int4',
    torch_dtype='auto',
    device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen1.5-110B-Chat-GPTQ-Int4')

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors='pt').to(device)
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

This code snippet is your magic wand, unlocking the door to unleashing the language model.

Tips for Optimizing Performance

If you encounter issues like code switching, consider utilizing the provided hyper-parameters found in generation_config.json.

Troubleshooting Common Issues

While working with Qwen1.5, you might face some common hurdles. Here are some troubleshooting strategies to assist you:

KeyError: Ensure you’ve installed the correct version of transformers as mentioned earlier.
Device errors: Double-check that you have a proper CUDA setup if you’re attempting to load the model on a GPU.
Input/Output format errors: Verify that your input data conforms to the expected format, especially when using the tokenizer.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox