In the fast-paced world of artificial intelligence, efficiency matters. Pruna AI offers GGUF versions of the microsoftPhi-3-mini-128k-instruct model that make AI models cheaper, smaller, faster, and greener. This guide will walk you through the essential steps for downloading and running these models, making AI more efficient for you!
Understanding the GGUF Model Analogy
Think of an AI model as a book in a library. The larger and heavier the book, the more physical space it occupies, making it cumbersome to transport. The GGUF format is like condensing a massive, detailed book into a pocket-sized version without losing significant content. It may not be as bulky, but you still get most of the valuable information you’ll need! This compression translates into lower costs and faster performance.
Downloading GGUF Files
To access the GGUF models, follow these simple steps:
- Option A – Through a Text Generation Web UI:
- Step 1: Enter the model repository: PrunaAIPhi-3-mini-128k-instruct-GGUF-Imatrix-smashed in the download model section.
- Step 2: Specify the filename you want to download, such as phi-2.IQ3_M.gguf.
- Step 3: Click on Download.
- Option B – Command Line Method:
- Step 1: Install the huggingface-hub library using the command:
pip3 install huggingface-hub - Step 2: Download the desired model file with:
huggingface-cli download PrunaAIPhi-3-mini-128k-instruct-GGUF-Imatrix-smashed Phi-3-mini-128k-instruct.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
Running the GGUF Model
Now that you have downloaded the model, let’s run it:
- Option A – Using llama.cpp:
- Make sure to use llama.cpp starting from commit d0cee0d or later.
./main -ngl 35 -m Phi-3-mini-128k-instruct.IQ3_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] prompt [INST]" - Adjust -ngl for GPU layers and -c for sequence length as needed.
- First, install llama-cpp-python using:
pip install llama-cpp-python
from llama_cpp import Llama
llm = Llama(model_path="Phi-3-mini-128k-instruct.IQ3_M.gguf", n_ctx=32768, n_threads=8, n_gpu_layers=35)
output = llm("[INST] prompt [INST]", max_tokens=512, stop=["[s]"], echo=True)
Troubleshooting Tips
If you encounter issues while downloading or running the models, here are some common troubleshooting ideas:
- Ensure that you have the correct permissions to access model files.
- Check your internet connection if downloads stop or stall frequently.
- If you face compatibility issues, confirm that you’re using the recommended versions of libraries.
- Refer to the official documentation for more detailed instructions.
- For additional insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
By leveraging Pruna AI’s GGUF models, you can optimize the performance and efficiency of your AI applications significantly. This ensures that you remain at the forefront of AI technology while also being budget-conscious.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

