Learning to Play Tetris with Monte Carlo Tree Search and Temporal Difference Learning

Jul 16, 2024 | Data Science

homemayankDocumentsarticle-generation-using-llmresized_images_gitreinforcement_learningreadme_hrpan_tetris_mcts

Welcome to an exciting journey through the world of Tetris and artificial intelligence! This blog post explores how to train an agent to play Tetris using sophisticated strategies like Monte Carlo Tree Search (MCTS) and Temporal Difference Learning (TD). Get ready to uncover the intriguing methodologies and insights behind this ambitious personal project!

Introduction

This project is a labor of love and fascination for one of my all-time favorite puzzle games, Tetris. Initially, I delved into applying Deep Q-Learning. However, I quickly realized the sheer difficulty of training an agent to play at human-level proficiency could be largely attributed to the game’s reward sparsity and long-term dependency. It requires multiple actions just to clear a single line!

Inspired by the groundbreaking AlphaGo’s success against Lee Sedol, I was convinced that a model-based approach—rather than conventional model-free methods—could vastly enhance my Tetris agent. Thus, the MCTS-TD agent was born, merging these cutting-edge techniques for our beloved puzzle game!

How is this Related to AlphaGo

At the heart of AlphaGo lies a clever search mechanism employing upper confidence bounds on trees (UCT). Unlike standard MCTS that simulates the entire game to evaluate the current state, AlphaGo integrates a neural network to predict both the value and potential moves. In this project, I adapted this principle by leveraging exponential moving averages and variances, drawn from the neural network, to devise upper confidence bounds suitable for single-player scenarios. The use of bootstrapped targets instead of final scores facilitates Temporal Difference Learning.

How is This Different from Other Tetris Bots?

Most high-performing Tetris bots seen online rely on heuristics—factors like the number of holes or column heights—to create rewards. While these heuristics simplify learning by delivering denser rewards, they can skew the agent’s learning away from the genuine goal of the game: clearing lines. My agent, however, is designed to generalize beyond Tetris, establishing standard requirements suited to various environments.

Prerequisites

Check requirements.txt for the required libraries.
Set up the Tetris environment from here.
Use the Python-C++ binding library, pybind11.

Training Your Agent

Once you’ve set everything up, training your agent is as simple as executing a single command:

python play.py --agent_type ValueSimLP --online --ngames 1000 --mcts_sims 100

Results

Version 1.0

Check out the agent in action here! Trained using 300 simulations per move, and each iteration consisted of 100 games. A benchmark game was also played at 1500 SPM to validate the agent’s performance.

Averaged and standard deviation of line clears from training benchmark games. (Left: training; Right: benchmark.)

Version 2.0

Witness the evolution of the agent here! In this version, the reward function was revamped, and the agent achieved a staggering 5678 line clears after 750 training episodes. Unfortunately, we experienced a hiccup as the system ran out of RAM in the process!

Average and standard deviation of scores and line clears at each iteration. (Left: training; Right: benchmark).

Troubleshooting

As with any ambitious coding project, you may encounter a few bumps along the way. Here are some troubleshooting ideas:

Check if all required libraries are spelled correctly in requirements.txt.
If your agent isn’t learning, consider tweaking your training parameters.
For issues related to memory, try optimizing your code or employing a more memory-efficient training strategy.
Remember, if you’re trying to reproduce earlier results, ensure you check previous commits for variations that may not work as expected.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox