Model-based reinforcement learning (RL) has been making waves for its ability to achieve excellent sample efficiency. However, the challenge has always been its comparative performance to model-free algorithms—especially when dealing with complex environments using high-capacity deep networks. In this article, we’ll explore a new approach called Probabilistic Ensembles with Trajectory Sampling (PETS), which utilizes uncertainty-aware dynamics models. We will guide you on how to implement this innovative strategy for your own experiments.
Understanding the PETS Algorithm
Imagine you are a captain navigating a ship through unpredictable waters. You rely on various tools and techniques to minimize the risks of hitting a rock (unknown dynamics). The PETS algorithm works in a similar way by utilizing a “map” (the dynamics models) that accounts for uncertainties in the environment. Instead of simply guessing your next move, the algorithm samples potential trajectories, thus significantly enhancing exploration while maintaining safety.
Requirements
- Ensure you have MuJoCo 1.31 installed.
- Install other dependencies with the command:
pip install -r requirements.txt. - Alternatively, use the provided Dockerfile. You can pull a prebuilt image using:
docker pull kchuahandful-of-trials. - Note: You will need a valid MuJoCo key, which should be mounted at
root/mujoco/mjkey.txt. - Launch the image with access to GPUs using
nvidia-docker.
Running Experiments
To run experiments on a specific environment, use the following command structure:
python scripts/mbexp.py -env ENV
Replace ENV with the name of the environment, which can be one of: cartpole, reacher, pusher, halfcheetah.
You can also customize your run with optional arguments like:
-ca CTRL_ARG: Arguments for the controller.-o OVERRIDE: Overrides to default parameters.-logdir LOGDIR: Specify a directory for logging results (default is.).
To reproduce the results using default arguments, execute:
python scripts/mbexp.py -env halfcheetah
Results will be saved in the specified log directory.
Visualizing Results
You can visualize the rollouts of the trained model using:
python scripts/render.py -model_dir path/to/saved/model/files
Make sure the directory contains model.mat and model.nns.
Controller Arguments
When using the controller arguments with -ca, you can select several options:
- model-type: Choose from deterministic (D), probabilistic (P), or ensembling (E).
- prop-type: Determines the propagation method such as deterministic (E), distribution sampling (DS), or trajectory sampling methods (TS*).
- opt-type: Choose the optimizer for action sequences like Random or CEM.
For instance, to configure for the half-cheetah task using probabilistic networks with trajectory sampling, use:
python scripts/mbexp.py -env halfcheetah -ca model-type PE -ca prop-type TSinf -ca opt-type CEM
Troubleshooting
If you face issues during installation or execution, consider the following solutions:
- Ensure that the MuJoCo license key is correctly placed in the specified directory.
- For Docker image issues, ensure that you are using
nvidia-dockerand that the GPU is properly recognized. - Always check if you have installed all the dependencies correctly by reviewing the
requirements.txt. - Run the scripts with
--helpfor a detailed overview of the available commands and options.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Plotting Results
Example plotting code can be found in plotter.ipynb, and you can easily run it using Jupyter Notebooks.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
