How to Use the Qwen2-0.5B-Instruct-GGUF Model

Jun 21, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_20_38

The Qwen2 language model represents a significant leap forward in text generation and understanding. With a range of parameters, including a remarkable 0.5 billion instruction-tuned model, Qwen2 sets a new benchmark in the world of open-source language models. In this article, we’ll guide you through the installation and implementation of the Qwen2 model, ensuring that you can leverage its capabilities in your projects.

Getting Started with Qwen2

Before diving into the usage of Qwen2, ensure that you have the necessary dependencies installed. This article assumes you are running commands within the llama.cpp repository.

Installation Requirements

Clone the llama.cpp repository by following its official guide.
Install the Hugging Face CLI using the command:

pip install huggingface_hub

How to Download the Qwen2 Model

Instead of cloning the entire repository, you can download specific GGUF files directly. To do so, use the following command:

huggingface-cli download QwenQwen2-0.5B-Instruct-GGUF qwen2-0_5b-instruct-q5_k_m.gguf --local-dir . --local-dir-use-symlinks False

Running the Qwen2 Model

Once downloaded, you can run the Qwen2 model using llama-server, which is both simple and compatible with the OpenAI API. Here’s how to do it:

llama-server -m qwen2-0_5b-instruct-q5_k_m.gguf -ngl 24 -fa

Note: The -ngl 24 option refers to offloading 24 layers to GPUs, and -fa enables flash attention.

Accessing the Deployed Service

You can access the deployed service using the following Python code snippet:

import openai
client = openai.OpenAI(
    base_url="http:localhost:8080/v1", # Your API server IP:port
    api_key="sk-no-key-required"
)
completion = client.chat.completions.create(
    model="qwen",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me something about Michael Jordan."}
    ]
)
print(completion.choices[0].message.content)

Using llama-cli

If you prefer using llama-cli, make sure to adjust your command accordingly. Here’s the revised command:

llama-cli -m qwen2-0_5b-instruct-q5_k_m.gguf -n 512 -co -i -if -f prompts/chat-with-qwen.txt --in-prefix "im_startuser\n" --in-suffix "im_end\n" -ngl 24 -fa

Model Evaluation

To evaluate model performance, we implement perplexity evaluation using Wikitext. Below is a concise summary of different model sizes and their associated perplexity (PPL) metrics:

Size     fp16     q8_0     q6_k     q5_k_m   q5_0     q4_k_m   q4_0     q3_k_m   q2_k     iq1_m   
--------------------------------------------------------------------------------------------------
0.5B     15.11    15.13    15.14    15.24    15.40    15.36    16.28    15.70    16.74    -       
1.5B     10.43    10.43    10.45    10.50    10.56    10.61    10.79    11.08    13.04    -       
7B       7.93     7.94     7.96     7.97     7.98     8.02     8.19     8.20     10.58    -       
57B-A14B 6.81     6.81     6.83     6.84     6.89     6.99     7.02     7.43     -        -       
72B      5.58     5.58     5.59     5.59     5.60     5.61     5.66     5.68     5.91     6.75

Troubleshooting Tips

If you encounter any issues while using the Qwen2 model, consider the following troubleshooting ideas:

Ensure that the dependencies are correctly installed and up to date.
Check your command syntax for any typographical errors.
Verify that your API server is running and accessible.
If your system encounters memory issues, consider reducing the number of layers offloaded to GPUs.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

In closing, the Qwen2 model is a powerful tool for text generation and understanding. It combines advanced techniques and a robust architecture to deliver high-performance results in numerous applications. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox