Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Mar 27, 2021 | Data Science

Model-based reinforcement learning (RL) has been making waves for its ability to achieve excellent sample efficiency. However, the challenge has always been its comparative performance to model-free algorithms—especially when dealing with complex environments using high-capacity deep networks. In this article, we’ll explore a new approach called Probabilistic Ensembles with Trajectory Sampling (PETS), which utilizes uncertainty-aware dynamics models. We will guide you on how to implement this innovative strategy for your own experiments.

Understanding the PETS Algorithm

Imagine you are a captain navigating a ship through unpredictable waters. You rely on various tools and techniques to minimize the risks of hitting a rock (unknown dynamics). The PETS algorithm works in a similar way by utilizing a “map” (the dynamics models) that accounts for uncertainties in the environment. Instead of simply guessing your next move, the algorithm samples potential trajectories, thus significantly enhancing exploration while maintaining safety.

Requirements

  • Ensure you have MuJoCo 1.31 installed.
  • Install other dependencies with the command: pip install -r requirements.txt.
  • Alternatively, use the provided Dockerfile. You can pull a prebuilt image using: docker pull kchuahandful-of-trials.
  • Note: You will need a valid MuJoCo key, which should be mounted at root/mujoco/mjkey.txt.
  • Launch the image with access to GPUs using nvidia-docker.

Running Experiments

To run experiments on a specific environment, use the following command structure:

python scripts/mbexp.py -env ENV

Replace ENV with the name of the environment, which can be one of: cartpole, reacher, pusher, halfcheetah.

You can also customize your run with optional arguments like:

  • -ca CTRL_ARG: Arguments for the controller.
  • -o OVERRIDE: Overrides to default parameters.
  • -logdir LOGDIR: Specify a directory for logging results (default is .).

To reproduce the results using default arguments, execute:

python scripts/mbexp.py -env halfcheetah

Results will be saved in the specified log directory.

Visualizing Results

You can visualize the rollouts of the trained model using:

python scripts/render.py -model_dir path/to/saved/model/files

Make sure the directory contains model.mat and model.nns.

Controller Arguments

When using the controller arguments with -ca, you can select several options:

  • model-type: Choose from deterministic (D), probabilistic (P), or ensembling (E).
  • prop-type: Determines the propagation method such as deterministic (E), distribution sampling (DS), or trajectory sampling methods (TS*).
  • opt-type: Choose the optimizer for action sequences like Random or CEM.

For instance, to configure for the half-cheetah task using probabilistic networks with trajectory sampling, use:

python scripts/mbexp.py -env halfcheetah -ca model-type PE -ca prop-type TSinf -ca opt-type CEM

Troubleshooting

If you face issues during installation or execution, consider the following solutions:

  • Ensure that the MuJoCo license key is correctly placed in the specified directory.
  • For Docker image issues, ensure that you are using nvidia-docker and that the GPU is properly recognized.
  • Always check if you have installed all the dependencies correctly by reviewing the requirements.txt.
  • Run the scripts with --help for a detailed overview of the available commands and options.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Plotting Results

Example plotting code can be found in plotter.ipynb, and you can easily run it using Jupyter Notebooks.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox