Welcome to our exploration of reinforcement learning projects! In this article, we will cover various projects, including the famous Cart-Pole, Mountain-Car, Pendulum, Lunar Lander, and Bipedal Walker, completed using Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) algorithms. Let’s dive into how you can replicate these projects and the troubleshooting tips to guide you along the way!
Prerequisites
- Python: 3.7
- Keras: 2.4.3
- TensorFlow: 2.2.0
Project 1: Cart-Pole
In the Cart-Pole project, we aim to balance a rod on top of a cart. The action space consists of two discrete options:
- 0: Move cart to the left
- 1: Move cart to the right
I solved this problem using DQN in around 60 episodes, as shown in the graph below:
Project 2: Mountain-Car
This project involves teaching a car to reach a goal positioned at the top of a mountain. The action space here is:
- 0: Move car to the left
- 1: Do nothing
- 2: Move car to the right
I successfully solved this problem using DQN in around 15 episodes. Here’s a glimpse of the result:

Project 3: Pendulum
For the Pendulum, the goal is to balance an upside-down pendulum. We have a single continuous action space:
- 0: Apply torque in the range of [-2, 2]
This was solved using DDPG in approximately 100 episodes, leading to the following performance graph:

Project 4: Lunar Lander
The objective of the Lunar Lander project is to land a spaceship smoothly between two flags. The ship features three throttles, one pointing downward and two directing left and right. Here, you can explore both discrete and continuous versions.
In order to solve the episode, you need to achieve a reward of +200 for 100 consecutive episodes. I successfully managed both versions within 400 episodes. Check out the plots below:
Discrete Version

Continuous Version

Project 5: Bipedal Walker
The final project involves a Bipedal Walker, which has two legs, each with two joints. The goal is to teach it to walk by applying torque within the range of (-1, 1). Positive rewards are given for moving forward, while negative rewards are incurred for applying torque on the motors.
Smooth Terrain
Initially, the AI behaves randomly, struggling to control and balance its legs. After 300 episodes, it learns to crawl using one knee and discovers safer movements to prevent falls:

Progress through Training
- After 500 episodes, it starts balancing on both legs, but refinement is still needed:
- At 600 episodes, it maximizes rewards with a unique walking style, revealing the diverse potential of AI behavior:

Hardcore Terrain
I utilized the weights saved from the simple terrain training and resumed training on hardcore terrain. Since the agent already knew how to walk, it now needed to learn to navigate obstacles:

Troubleshooting
If you encounter problems during your projects, consider the following troubleshooting tips:
- Ensure all libraries are correctly installed.
- Verify that your Python version aligns with the project requirements.
- Experiment with hyperparameter tuning to improve performance.
- Utilize visualization tools to track agent performance during training.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
