In this guide, we will walk you through the process of training the Korean-enhanced version of the Llama-3 model, which has been pretrained on a massive dataset to improve its proficiency in Korean while retaining capabilities in English. Ready to dive in? Let’s get started!
Model Details
This model is based on the Meta-Llama-3-8B architecture. The continual pretraining utilizes a variety of Korean and English datasets.
Datasets Used
To give your model the richness of language, we used a staggering 16 billion tokens sampled from three major datasets:
| Sources | Tokens (Llama-3-8B) |
| AI-Hub | 9.2B |
| Modu Corpus | 5.8B |
| Wikipedia | 5.4B |
Understanding the Hyperparameters
When training a model, think of hyperparameters as the ingredients in your recipe. Each ingredient must be measured just right to create a delicious outcome. Here are the key hyperparameters for the Llama-3 model:
| Learning rate | Optimizer | Betas | Weight decay | Warm-up ratio |
| 3e-5 | AdamW | (0.9, 0.95) | 0.1 | 0.05 |
Intended Use
This model is not yet fine-tuned, so you will need to train it on your specific dataset before it can serve your needs effectively. Consider it like a blank canvas that awaits your artistic touch!
Evaluation Insights
To ensure that it meets your expectations, we’ve evaluated the model against both English and Korean benchmarks:
| English | Korean | ||||||
| Model | MMLU (5 shots) | HellaSwag (10 shots) | GSM8K (8 shots, CoT) | BBH (3 shots, CoT) | KMMLU (5 shots) | HAE-RAE (5 shots) | KoBEST (5 shots) |
| meta-llama/Meta-Llama-3-8B | 65.1 | 82.1 | 52.0 | 61.9 | 40.2 | 61.1 | 69.2 |
| saltlux/Ko-Llama3-Luxia-8B | 57.1 | 77.1 | 32.3 | 51.8 | 39.4 | 69.2 | 71.9 |
| beomi/Llama-3-Open-Ko-8B | 56.2 | 77.4 | 31.5 | 46.8 | 40.3 | 68.1 | 72.1 |
| beomi/Llama-3-KoEn-8B | 52.5 | 77.7 | 21.2 | 43.2 | 40.8 | 71.3 | 73.8 |
| tesser-ai/Tesser-Llama-3-Ko-8B | 60.5 | 79.8 | 40.3 | 56.3 | 42.5 | 72.1 | 73.8 |
Limitations
Due to the constraints of available resources, this model was trained with a context length of 4k. However, for optimal performance in downstream tasks, the original model with a context length of 8k could provide better results.
Troubleshooting
If you encounter any issues during training, here are some troubleshooting tips:
- Ensure your dataset is correctly formatted and compatible with the model.
- Check the hyperparameters to ensure they align with recommended values.
- Monitor system resources and adjust batch size if you experience memory errors.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

