Welcome to an exciting exploration of enhancing language models with innovative long-term memory capabilities! This guide will walk you through the steps needed to implement the LongMem framework based on the paper Augmenting Language Models with Long-Term Memory. We will cover the environment setup, project structure, and how to evaluate the models, all while keeping things user-friendly. Let’s dive in!
Step 1: Environment Setup
The first stage in your journey involves setting up your environment. Here’s how to do it:
- Install PyTorch: Follow the official installation guide for torch. We recommend using
torch==1.8.0and selecting the version that matches your CUDA driver. - Faiss-GPU: For Nvidia V100 GPUs, simply run:
pip install faiss-gpu
conda install faiss-gpu cudatoolkit=11.0 -c pytorch
pip install --editable .fairseq
pip install -r requirements.txt
Step 2: Understanding the Project Structure
Next, you’ll want to familiarize yourself with the structure of the LongMem project. Here’s what to expect:
- Pre-trained LLM Class: Found in
fairseq/fairseq/models/new_gpt.py - Transformer Decoder with Side Network: Located in
fairseq/fairseq/models/sidenet/transformer_decoder_sidenet.py - Transformer Language Model with Side Network Class: In
fairseq/fairseq/models/transformer_lm_sidenet.py - Memory Bank and Retrieval: Available at
fairseq/fairseq/modules/dynamic_memory_with_chunk.py - Joint Attention for Memory Fusion: Found at
fairseq/fairseq/modules/joint_multihead_attention_sum.py
Step 3: Training the Model
Now that your environment is ready, it’s time for some action! Let’s embark on training your model:
- Data Collection & Preprocessing: Begin by downloading the Pile dataset from its official release. The dataset is organized into various JSON lines. Refer to
preprocess/filter_shard_tnlg.pyfor sampling the training set and binarizing the data. - Memory-Augmented Adaptation Training: Run the training script with the following command:
bash train_scripts/train_longmem.sh
Step 4: Evaluation
To fine-tune our model, we must evaluate it. Here’s the process:
- Download the checkpoints for pre-trained models like GPT2-medium model and LongMem model.
- Evaluate GPT-2 Baseline:
python eval_scripts/eval_longmem_icl.py --path path/to/gpt2_pretrained_model
python eval_scripts/eval_longmem_icl.py --path path/to/longmem_model --pretrained-model-path path/to/gpt2_pretrained_model
Troubleshooting
In your journey through setting up and using LongMem, you may encounter some bumps along the way. Here are a few troubleshooting tips:
- If you face issues during installation, ensure that all versions match your hardware specifications, especially the CUDA version with PyTorch.
- Check for compatibility between your installed packages. Updating or downgrading may resolve conflicts.
- If the training process fails, ensure that the dataset is correctly formatted and accessible.
- For any unresolved errors related to faiss-gpu, refer to the GitHub issue for guidance.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
With that, you are now equipped with a solid roadmap for implementing LongMem. Enjoy your exploration of augmented language models!

