How to Use Deep Q-Network to Learn How to Play Flappy Bird

Feb 15, 2024 | Data Science

Flappy Bird, a seemingly simple yet notoriously challenging game, has captured the attention of developers and AI enthusiasts alike. This project demonstrates how to utilize the Deep Q-Learning algorithm to teach an AI agent to play Flappy Bird effectively. In this guide, we will walk through the installation process, how to run the program, and answer some FAQs.

Overview

This project is inspired by the works of Mnih et al. in Playing Atari with Deep Reinforcement Learning. It showcases the adaptability of Deep Q-Learning to the Flappy Bird environment.

Installation Dependencies

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

Understanding Deep Q-Network

Think of the Deep Q-Network (DQN) as a brain that learns from scratch how to navigate the challenging world of Flappy Bird. Just as a child might learn to ride a bicycle by experimenting, falling, and adjusting their strategy, the DQN uses raw pixel data as input and learns to estimate the best actions to take for future rewards. Imagine this learning process as a series of increasingly refined guesses, shaped by experience.

Deep Q-Network Algorithm

The algorithm operates through a loop that resembles a continuous training cycle. Here’s a brief analogy to make sense of this flow:

Imagine an athlete (the agent) preparing for a big competition (the game). Each day, they train (the episode), focusing on various actions (flapping through the pipes). Random days (selected actions) allow them to try new strategies, while specific routines (observations) help them refine what works best. Over time, as the athlete accumulates experience (the replay memory), they start to recognize which actions yield the best outcomes (the value function).

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^ ) Q(s_(j+1),a; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

During training, we observe the raw pixel values from the game. To enhance performance, removing the background of the game makes the convergence faster.

Network Architecture

To train the DQN efficiently, the game frames undergo a transformation process:

  1. Convert the image to grayscale.
  2. Resize the image to 80×80 pixels.
  3. Stack the last 4 frames into a single input array.

The network follows a convoluted structure involving multiple layers to process the data.

Training

The training process is crucial for the effective operation of the DQN. During the initial phase, actions are taken randomly to build the replay memory. The training then proceeds by sampling from this memory, thereby utilizing previous experiences to improve the learning rate.

Troubleshooting

  • Checkpoint not found: Make sure to update the saved network file path by changing the first line of saved_networks checkpoint.
  • How to reproduce issues:
    • Comment out specific lines in the deep Q network code.
    • Adjust parameters in the deep_q_network.py file for better results.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox