A gobang robot based on reinforcement learning
Policy-Value Net
The network structure consists of the following components:
- ConvBlock × 1
- ResidueBlock × 4
- PolicyHead × 1
- ValueHead × 1
Quick Start
Getting started with Alpha Gobang Zero is a breeze! Follow the simple steps below:
-
Create a virtual environment:
conda create -n Alpha_Gobang_Zero python=3.8 conda activate Alpha_Gobang_Zero pip install -r requirements.txt
-
Install PyTorch:
For details, refer to the blog.
-
Start the game:
conda activate Alpha_Gobang_Zero python game.py
Train Model
To train the model, use the following command:
conda activate Alpha_Gobang_Zero
python train.py
Blog
You can find more information on the blog at Alpha Gobang Zero.
Reference
- Mastering the game of Go without human knowledge
- Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
FAQs
-
Why does the window get stuck when it is dragged?
This issue occurs because the interface background uses an acrylic effect, which can cause problems for some versions of Windows 10. Here are three solutions:
- Upgrade Windows 10 to the latest version.
- Uncheck the checkbox for Advanced system settings — Performance — Show window contents when dragging.
- Turn off the option to enable acrylic background in the settings interface.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
-
Why does the configuration I modified in the settings interface not take effect immediately?
The modified configuration will take effect at the beginning of the next game.
Understanding the Network Structure
Imagine you want to build a powerful contraption that can win the game of Gobang. Think of the network structure as a well-designed assembly line where each portion of the assembly line has a specific role:
- The ConvBlock is like the initial station that handles the raw materials, transforming basic data into a more usable form.
- The ResidueBlocks are akin to the quality control stations, ensuring that each piece meets the standards before moving on. There are four of these because repetition increases quality and efficiency.
- The PolicyHead is the decision-making station; it decides the best moves to make based on the processed data it receives.
- Finally, the ValueHead assesses the game’s current score, similar to a final review station before sending out the machine to play.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.