How to Use the Exl2 Quantized Version of MN-12B-Celeste-V1.9

Aug 2, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_28_44

Welcome to our comprehensive guide on using the Exl2 quantized version of the MN-12B-Celeste-V1.9 model. In this article, we’ll walk you through everything from setting up to troubleshooting potential issues, all while making it as user-friendly as possible.

Understanding the MN-12B-Celeste Model

The MN-12B-Celeste-V1.9 model has been quantized for better performance and efficiency. Think of quantization like shrinking a large, complex book into a concise summary that still conveys the main ideas. This makes it easier to store and process without compromising too much on quality.

Getting Started: Downloading the Model

To get started, you will need to download the Exl2 quantized version of the model using either the async-hf-downloader or directly from the HuggingFace hub. Here’s how:

Method 1: Using async-hf-downloader

First, ensure you have the async-hf-downloader installed. Then, run the following command:

async-hf-downloader royallab/MN-12B-Celeste-V1.9-exl2 -r 6bpw -p MN-12B-Celeste-V1.9-exl2-6bpw

Method 2: Using HuggingFace Hub

Alternatively, open your terminal and run:

huggingface-cli download royallab/MN-12B-Celeste-V1.9-exl2 --revision 6bpw --local-dir MN-12B-Celeste-V1.9-exl2-6bpw

Setting Up TabbyAPI

Next, you’ll want to run the MN-12B-Celeste model using TabbyAPI. This is a FastAPI server tailored for ExllamaV2. Here’s how to set it up:

Navigate to your TabbyAPI’s config.yml file.
Set the model_name to MN-12B-Celeste-V1.9-exl2-6bpw.
You can also launch TabbyAPI using the argument: --model_name MN-12B-Celeste-V1.9-exl2-6bpw when starting it up.

Launch TabbyAPI inside your Python environment by executing either .start.bat (for Windows) or .start.sh (for Unix-based systems).

Troubleshooting

If you encounter issues while attempting to run this setup, consider the following troubleshooting tips:

Model Not Found: Double-check the model name entered in config.yml. Ensure that it matches exactly.
Download Errors: Verify your internet connection and try re-downloading the model using the provided commands.
Runtime Issues: Ensure that your Python environment has all the necessary dependencies installed. Running pip install -r requirements.txt in your TabbyAPI directory can help.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

Using the Exl2 quantized version of the MN-12B-Celeste-V1.9 model can significantly enhance your AI projects without demanding too much in terms of VRAM. Remember, the 6 bits per weight (6bpw) format is recommended for the best quality. As a bonus, quants greater than 6bpw won’t yield any improvements, so stick to this for optimal performance.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox