A modular RL library to fine-tune language models to human preferences
We provide easily customizable building blocks for training language models including implementations of on-policy algorithms, reward functions, metrics, datasets and LM based actor-critic policies.
Thoroughly tested and benchmarked with over 2000 experiments π₯ (GRUE benchmark π) on a comprehensive set of:
- 7 different Natural Language Processing (NLP) Tasks: 
        - Summarization
- Generative Commonsense Reasoning
- IMDB Sentiment-based Text Continuation
- Table-to-text Generation
- Abstractive Question Answering
- Machine Translation
- Dialogue Generation
 
- Different types of NLG metrics (20+) which can be used as reward functions:
        - Lexical Metrics (e.g., ROUGE, BLEU, SacreBLEU, METEOR)
- Semantic Metrics (e.g., BERTSCORE, BLEURT)
- Task-specific Metrics (e.g., PARENT, CIDER, SPICE)
- Scores from pre-trained classifiers (e.g., Sentiment scores)
 
- On-policy algorithms of PPO, A2C, TRPO, and novel NLPO (Natural Language Policy Optimization)
- Actor-Critic Policies supporting causal LMs (e.g., GPT-2) and seq2seq LMs (e.g., T5, BART)
How to Get Started with RL4LMs
Want to dive right in? You’re in the right place! Hereβs how to quickly start using the RL4LMs library.
Local Installation
To set up RL4LMs on your machine, follow these steps:
bash
git clone https://github.com/allenai/RL4LMs.git
cd RL4LMs
pip install -e .
Using Docker
Prefer a containerized solution? No problem! You can use Docker as follows:
bash
docker build . -t rl4lms
Quick Start – Training PPO/NLPO
Once installed, you can utilize our training API using pre-defined YAML configs. Hereβs how you can train a model:
bash
python scripts/training/train_text_generation.py --config_path scripts/training/task_configs/summarization/t5_ppo.yml
Code Explained with an Analogy
Imagine you’re building a custom car (your language model) and you have a box of parts (available algorithms, metrics, and building blocks). Just like how a car needs wheels, an engine, and a frame to function well, your language model needs specific components to train effectively:
- Algorithms are like the engine; they drive your model’s learning.
- Metrics serve as the dashboard gauges, showing you how well your car is performing as it drives (i.e., the performance of your model).
- Datasets are the fuel, without which your car can’t run anywhere.
All these components are interchangeable, and you can tailor them to create your custom driving experience on the road to success in language model fine-tuning!
Troubleshooting
If you encounter issues during installation or model training, here are some common troubleshooting tips:
- Double-check your installation steps; make sure you are in the right directory.
- Ensure all dependencies are installed. Use pip to check for missing packages.
- If you find any compatibility issues, consult the GitHub repository for updates and patches.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

