How to Train Your Own Korean-Enhanced Llama-3 Model

Jun 13, 2024 | Educational

In this guide, we will walk you through the process of training the Korean-enhanced version of the Llama-3 model, which has been pretrained on a massive dataset to improve its proficiency in Korean while retaining capabilities in English. Ready to dive in? Let’s get started!

Model Details

This model is based on the Meta-Llama-3-8B architecture. The continual pretraining utilizes a variety of Korean and English datasets.

Datasets Used

To give your model the richness of language, we used a staggering 16 billion tokens sampled from three major datasets:

Sources Tokens (Llama-3-8B)
AI-Hub 9.2B
Modu Corpus 5.8B
Wikipedia 5.4B

Understanding the Hyperparameters

When training a model, think of hyperparameters as the ingredients in your recipe. Each ingredient must be measured just right to create a delicious outcome. Here are the key hyperparameters for the Llama-3 model:

Learning rate Optimizer Betas Weight decay Warm-up ratio
3e-5 AdamW (0.9, 0.95) 0.1 0.05

Intended Use

This model is not yet fine-tuned, so you will need to train it on your specific dataset before it can serve your needs effectively. Consider it like a blank canvas that awaits your artistic touch!

Evaluation Insights

To ensure that it meets your expectations, we’ve evaluated the model against both English and Korean benchmarks:

English Korean
Model MMLU (5 shots) HellaSwag (10 shots) GSM8K (8 shots, CoT) BBH (3 shots, CoT) KMMLU (5 shots) HAE-RAE (5 shots) KoBEST (5 shots)
meta-llama/Meta-Llama-3-8B 65.1 82.1 52.0 61.9 40.2 61.1 69.2
saltlux/Ko-Llama3-Luxia-8B 57.1 77.1 32.3 51.8 39.4 69.2 71.9
beomi/Llama-3-Open-Ko-8B 56.2 77.4 31.5 46.8 40.3 68.1 72.1
beomi/Llama-3-KoEn-8B 52.5 77.7 21.2 43.2 40.8 71.3 73.8
tesser-ai/Tesser-Llama-3-Ko-8B 60.5 79.8 40.3 56.3 42.5 72.1 73.8

Limitations

Due to the constraints of available resources, this model was trained with a context length of 4k. However, for optimal performance in downstream tasks, the original model with a context length of 8k could provide better results.

Troubleshooting

If you encounter any issues during training, here are some troubleshooting tips:

  • Ensure your dataset is correctly formatted and compatible with the model.
  • Check the hyperparameters to ensure they align with recommended values.
  • Monitor system resources and adjust batch size if you experience memory errors.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox