Anima: The First Open-Source 33B Chinese LLM Based on QLoRA

Jul 5, 2023 | Educational

In the world of artificial intelligence and natural language processing, Anima emerges as a significant contribution – the first fully open-source Chinese Large Language Model (LLM) built on QLoRA. In this article, we will explore how to train, utilize, and troubleshoot the Anima model effectively.

🚀 Model Training

Anima is powered by the QLoRA technology, which allows it to effectively finetune a 33B model leveraging Chinese conversational datasets. Let’s break down the training process in a user-friendly manner.

1. Backbone Model Selection

Think of Anima as a high-performance sports car – it’s built on a robust chassis (the 33B Guanaco model) and fine-tuned to enhance its performance on the road (the Chinese language). The model is trained for 10,000 steps on a high-end H100 GPU. This training is not just about endurance but ensuring the best ride for its drivers (users).

2. Training Data Choice

Anima utilizes the Chinese-Vicuna dataset, specifically the guanaco_belle_merge_v1.0 for finetuning. The approach taken is strategic—selecting datasets that balance quantity and quality to maximize the benefits of training within 10,000 steps.

3. Hyperparameter Selection

Choosing hyperparameters is like selecting the right settings for your coffee machine to brew the perfect cup. For Anima, the settings are straightforward and researched thoroughly to ensure the most efficient training process:

Batch size: 16
Max steps: 10,000
Learning rate: 1e-4
LoRA r=64, alpha=16
Source max length: 512
Target max length: 512

How to Train the Model

Reproducing Anima’s Training Process

To recreate Anima’s training process, follow these steps:

Install dependencies:

pip install -r requirements.txt

Navigate to the training folder:

cd training

Execute the training script:

./run_Amina_training.sh

Fine-Tuning Other Models Based on Anima

To fine-tune other models using Anima, follow the same installation process and modify the training script as needed:

./run_finetune_training_based_on_Anima.sh

📊 Validation and Evaluation

The Anima model’s efficacy is assessed using the Elo rating system, commonly adopted in competitive settings. Think of it like a gaming leaderboard where players (language models) compete for the top position based on their abilities to respond to challenges (prompts).

ChatGPT-3.5 turbo received a score of 1341.98
Anima scored 1096.69
Belle achieved 937.71
Chinese Vicuna rounded up with 623.62

🎉 How to Perform Inference

After the training phase, you can use the model for inference. Be sure to have all dependencies installed:

pip install -r https://github.com/lyogavin/Anima/blob/main/requirements.txt?raw=true

Refer to the inferrence.ipynb for examples or use the code snippet provided in the README.

Troubleshooting

Encountered an issue? Here are some common troubleshooting suggestions:

Ensure all dependencies are correctly installed.
Check if the training scripts have proper permissions to execute.
Verify your GPU environment to make sure it meets the necessary requirements.

For further assistance, remember that “For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.”

Conclusion

In conclusion, Anima represents a pivotal step towards enhanced Chinese language modeling. Its open-source nature encourages collaboration and innovation in AI. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox