Flappy Bird, a seemingly simple yet notoriously challenging game, has captured the attention of developers and AI enthusiasts alike. This project demonstrates how to utilize the Deep Q-Learning algorithm to teach an AI agent to play Flappy Bird effectively. In this guide, we will walk through the installation process, how to run the program, and answer some FAQs.
Overview
This project is inspired by the works of Mnih et al. in Playing Atari with Deep Reinforcement Learning. It showcases the adaptability of Deep Q-Learning to the Flappy Bird environment.
Installation Dependencies
- Python 2.7 or 3
- TensorFlow 0.7
- pygame
- OpenCV-Python
How to Run?
git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py
Understanding Deep Q-Network
Think of the Deep Q-Network (DQN) as a brain that learns from scratch how to navigate the challenging world of Flappy Bird. Just as a child might learn to ride a bicycle by experimenting, falling, and adjusting their strategy, the DQN uses raw pixel data as input and learns to estimate the best actions to take for future rewards. Imagine this learning process as a series of increasingly refined guesses, shaped by experience.
Deep Q-Network Algorithm
The algorithm operates through a loop that resembles a continuous training cycle. Here’s a brief analogy to make sense of this flow:
Imagine an athlete (the agent) preparing for a big competition (the game). Each day, they train (the episode), focusing on various actions (flapping through the pipes). Random days (selected actions) allow them to try new strategies, while specific routines (observations) help them refine what works best. Over time, as the athlete accumulates experience (the replay memory), they start to recognize which actions yield the best outcomes (the value function).
Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
Initialize state s_1
for t = 1, T do
With probability ϵ select random action a_t
otherwise select a_t=max_a Q(s_t,a; θ_i)
Execute action a_t in emulator and observe r_t and s_(t+1)
Store transition (s_t,a_t,r_t,s_(t+1)) in D
Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
Set y_j:
r_j for terminal s_(j+1)
r_j+γ*max_(a^ ) Q(s_(j+1),a; θ_i) for non-terminal s_(j+1)
Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
end for
end for
Experiments
Environment
During training, we observe the raw pixel values from the game. To enhance performance, removing the background of the game makes the convergence faster.
Network Architecture
To train the DQN efficiently, the game frames undergo a transformation process:
- Convert the image to grayscale.
- Resize the image to 80×80 pixels.
- Stack the last 4 frames into a single input array.
The network follows a convoluted structure involving multiple layers to process the data.
Training
The training process is crucial for the effective operation of the DQN. During the initial phase, actions are taken randomly to build the replay memory. The training then proceeds by sampling from this memory, thereby utilizing previous experiences to improve the learning rate.
Troubleshooting
- Checkpoint not found: Make sure to update the saved network file path by changing the first line of saved_networks checkpoint.
- How to reproduce issues:
- Comment out specific lines in the deep Q network code.
- Adjust parameters in the
deep_q_network.py
file for better results.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.