How to Recreate roneneldan’s Tiny Stories Using Llama Architecture

Jul 29, 2023 | Educational

In this blog post, we will guide you through the process of recreating the Tiny Stories (1M) model, leveraging the insightful architecture of Llama. This straightforward approach involves executing a few simple steps that will have you up and running in no time! Let’s dive in.

Prerequisites

Python installed on your system
Access to a machine with sufficient VRAM (preferably 40GB A100)
Jupyter Notebook to run the train.ipynb
Download the required datasets: TinyStoriesV2-GPT4-train.txt and TinyStoriesV2-GPT4-valid.txt

Steps to Follow

1. Set Up the Environment

Begin by ensuring that your environment is set up correctly. Download the TinyStoriesV2-GPT4-train.txt and TinyStoriesV2-GPT4-valid.txt files, placing them in the same folder as your training notebook (train.ipynb).

2. Run the Training Notebook

Open the train.ipynb in Jupyter Notebook and run the cells in order. You’ll observe that the validation content does not need to be provided; you can use any text you like in the validation file.

3. Backup Mechanism

Utilize the backup script named do_backup to copy weights from your remote machine to your local system. This is crucial as the weights are generated quickly, potentially creating synchronization issues between weight generation and backup.

4. Train Your Model

The training process can take approximately 9 hours, with each epoch consuming around 3 hours. Please note that stories longer than the context size will be truncated, and no sliding window technique is currently used.

5. Validation Script Usage

Once training is complete, execute the validation script by using the following command:

python valid.py path/to/TinyStoriesV2-GPT4-valid.txt [optional-model-id-or-path]

It is worth noting that I decided against splitting the validation data into chunks, as it appeared unnecessary.

Code Analogy: A Bakery’s Production Line

Think of the process we’ve outlined as a bakery producing unique pastries (stories). The train.ipynb serves as the bakery’s production line where you place the ingredients (text files). The weights are like freshly baked pastries; if they move too quickly from the oven to the display case (too fast to back them up), they could get out of sync, causing issues.

Validation is akin to tasting each pastry to ensure they are delicious, while the dumb caching mechanism shuffles stories like a baker randomly picks pastries for different assortments without recycling too frequently. The utmost goal is to have consistently delightful pastries (model performance) that impress our customers (users).

Troubleshooting

Issue: If the model fails to load the tokenizer correctly, ensure you have the proper dependencies installed. You can refer to the issue tracker for the Tokenizer GitHub Issue.
Issue: If you encounter problems with training taking too long, check whether enough resources are assigned to your machine.
Tip: Make sure to monitor the cache size, especially when the dataset is small, to prevent inefficiencies.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox