In the dynamic world of Reinforcement Learning (RL), mastering algorithms like Conservative Q Learning (CQL) and Soft Actor-Critic (SAC) can greatly enhance your model’s performance. This blog will guide you through the process of setting up and running these algorithms using a Python library called PyTorch.
Installation Steps
To get started, you’ll first need to set up your environment. Follow these steps:
- Install and use the included Anaconda environment:
- Run the command:
conda env create -f environment.yml - Activate the environment with:
source activate SimpleSAC - If you want to use MuJoCo, you’ll need to get your own MuJoCo key.
- Add the directory of this repo to your PYTHONPATH environment variable:
- Run the command:
export PYTHONPATH=$PYTHONPATH:$(pwd)
Running Experiments
Now that your environment is set up, you’re ready to run some experiments!
For SAC Experiments
Use the following command:
python -m SimpleSAC.sac_main --env HalfCheetah-v2 --logging.output_dir .experiment_output
All available command options can be found in SimpleSAC/conservative_sac_main.py and SimpleSAC/conservative_sac.py.
For CQL Experiments
Use this command instead:
python -m SimpleSAC.conservative_sac_main --env halfcheetah-medium-v0 --logging.output_dir .experiment_output
If you want to run on CPU only, just add the --device=cpu option.
Visualizing Experiments
To visualize experiment metrics easily, you can use viskit.
python -m viskit .experiment_output
Simply navigate to http://localhost:5000 to see your results.
Weights and Biases Online Visualization Integration
This codebase allows you to log results to the WB online visualization platform. For this, follow these steps:
- Set your WB API key environment variable:
- Run the command:
export WANDB_API_KEY=YOUR_WB_API_KEY_HERE - Run experiments with WB logging turned on:
python -m SimpleSAC.conservative_sac_main --env halfcheetah-medium-v0 --logging.output_dir .experiment_output --device=cuda --logging.online
Results of Running CQL on D4RL Environments
To save time and compute resources, I’ve conducted a sweep of CQL on specific D4RL environments with various minimum Q weight values. You can find the results here. Filter the environments to visualize the outcomes based on cql.cql_min_q_weight, averaged across 3 random seeds.
Troubleshooting
As you embark on implementing these algorithms, you may encounter issues. Here are some troubleshooting suggestions:
- If your experiments are not logging correctly, double-check your WB API key environment variable.
- Ensure that your MuJoCo key is properly added if you plan to run using MuJoCo.
- Check that your PYTHONPATH has been set correctly to include the repo directory.
- If you experience errors related to your Anaconda environment, try recreating it following the setup instructions again.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Understanding the Code with an Analogy
Think of implementing CQL and SAC like preparing a complex dish in a kitchen:
- The environment setup is akin to gathering all your ingredients and tools before cooking.
- Running experiments is comparable to placing your dish in the oven; you’ve set the temperature (parameters) and now wait for it to cook (algorithm training).
- Visualizing results is like plating your dish—making it presentable and ready for others to enjoy and critique.
The bittersweet nature of cooking (or coding) allows for both successes and failures, and troubleshooting is simply adjusting the recipe if things don’t turn out as expected!

