In this blog post, we will guide you through the process of recreating the Tiny Stories (1M) model, leveraging the insightful architecture of Llama. This straightforward approach involves executing a few simple steps that will have you up and running in no time! Let’s dive in.
Prerequisites
- Python installed on your system
- Access to a machine with sufficient VRAM (preferably 40GB A100)
- Jupyter Notebook to run the
train.ipynb - Download the required datasets:
TinyStoriesV2-GPT4-train.txtandTinyStoriesV2-GPT4-valid.txt
Steps to Follow
1. Set Up the Environment
Begin by ensuring that your environment is set up correctly. Download the TinyStoriesV2-GPT4-train.txt and TinyStoriesV2-GPT4-valid.txt files, placing them in the same folder as your training notebook (train.ipynb).
2. Run the Training Notebook
Open the train.ipynb in Jupyter Notebook and run the cells in order. You’ll observe that the validation content does not need to be provided; you can use any text you like in the validation file.
3. Backup Mechanism
Utilize the backup script named do_backup to copy weights from your remote machine to your local system. This is crucial as the weights are generated quickly, potentially creating synchronization issues between weight generation and backup.
4. Train Your Model
The training process can take approximately 9 hours, with each epoch consuming around 3 hours. Please note that stories longer than the context size will be truncated, and no sliding window technique is currently used.
5. Validation Script Usage
Once training is complete, execute the validation script by using the following command:
python valid.py path/to/TinyStoriesV2-GPT4-valid.txt [optional-model-id-or-path]
It is worth noting that I decided against splitting the validation data into chunks, as it appeared unnecessary.
Code Analogy: A Bakery’s Production Line
Think of the process we’ve outlined as a bakery producing unique pastries (stories). The train.ipynb serves as the bakery’s production line where you place the ingredients (text files). The weights are like freshly baked pastries; if they move too quickly from the oven to the display case (too fast to back them up), they could get out of sync, causing issues.
Validation is akin to tasting each pastry to ensure they are delicious, while the dumb caching mechanism shuffles stories like a baker randomly picks pastries for different assortments without recycling too frequently. The utmost goal is to have consistently delightful pastries (model performance) that impress our customers (users).
Troubleshooting
- Issue: If the model fails to load the tokenizer correctly, ensure you have the proper dependencies installed. You can refer to the issue tracker for the Tokenizer GitHub Issue.
- Issue: If you encounter problems with training taking too long, check whether enough resources are assigned to your machine.
- Tip: Make sure to monitor the cache size, especially when the dataset is small, to prevent inefficiencies.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

