How to Use Exl2 Quantized Version of MN-12B-Starcannon-v3

Aug 10, 2024 | Educational

In the world of artificial intelligence, optimizing model performance is paramount. The Exl2 quantized version of MN-12B-Starcannon-v3 stands out as a refined solution for efficient AI implementations. This blog post will walk you through everything you need to know about utilizing this model effectively, from download to runtime, accompanied by useful troubleshooting tips.

Understanding the Options

The Exl2 quantized version provides several branches you can leverage:

main: Contains measurement files.
4bpw: 4 bits per weight.
5bpw: 5 bits per weight.
6bpw: 6 bits per weight (recommended for optimal quality to VRAM usage ratio).

Note that quants greater than 6bpw are not created as they offer no additional improvement. If you require higher quantization, consider reaching out to the community or exploring the option of creating them yourself.

Downloading the Model

To embark on your journey with the Exl2 quantized model, follow these steps to download it:

Using Async Hugging Face Downloader

To use the lightweight and asynchronous downloader, execute the following command in your terminal:

./async-hf-downloader royallab/MN-12B-Starcannon-v3-exl2 -r 6bpw -p MN-12B-Starcannon-v3-exl2-6bpw

Using Hugging Face Hub

If you prefer using the Hugging Face hub, make sure you have installed the required package:

pip install huggingface_hub

Then run this command:

huggingface-cli download royallab/MN-12B-Starcannon-v3-exl2 --revision 6bpw --local-dir MN-12B-Starcannon-v3-exl2-6bpw

Setting Up TabbyAPI

To run the model, we will utilize TabbyAPI, a FastAPI server designed for efficient operation. Here’s how to get it up and running:

Locate the config.yml file inside the TabbyAPI directory.
Modify the value of model_name to MN-12B-Starcannon-v3-exl2-6bpw.
You can also pass the model name during startup with the following command:

--model_name MN-12B-Starcannon-v3-exl2-6bpw

Alternatively, use the API endpoint /v1/model/load to set the model name.
Finally, launch TabbyAPI within your Python environment by running:

./start.bat

./start.sh

Troubleshooting Tips

If you encounter any issues while using the Exl2 quantized model, here are a few suggestions to help you troubleshoot:

Ensure that you have the required VRAM available, especially when using 6bpw or higher.
Double-check the configuration in config.yml to confirm you’ve set model_name correctly.
If the model fails to load, validate that you have the latest version of libraries installed.
Should problems persist, consider testing with a different branch (4bpw or 5bpw) to determine if it is model-specific.

For creative insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Optimizing your AI models doesn’t have to be complex with the Exl2 quantized version of MN-12B-Starcannon-v3. Following this guide will empower you to utilize this model with ease.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox