Are you ready to dive into the intriguing world of multi-agent reinforcement learning? This blog will walk you through the installation, execution, and experimentation of the Shapley Q-value deep deterministic policy gradient (SQDDPG) algorithm, designed to optimize rewards for multiple agents operating in complex environments.
Dependencies You Need to Get Started
Before running the SQDDPG algorithm, make sure you’ve set up the right environment. Here’s what you need:
- Ubuntu 18.04
- Python 3.5.4
- Pytorch 1.0
- Anaconda 3 for easy dependency management – download it here.
- OpenAI Gym (version 0.10.5)
- Numpy (version 1.14.5)
- TensorFlow (version r1.14) for monitoring training with Tensorboard
Once all dependencies are in place, open your terminal and execute the following commands to set up your environment:
bash
cd SQDDPG/environments/multiagent_particle_envs
pip install -e .
Running Code for Experiments
The experiments for SQDDPG are tailored around two environments: Cooperative Navigation and Prey-and-Predator, with additional scenarios such as Traffic Junction for varied complexity. You can find the environments on GitHub:
- For multi-agent particle environments: OpenAI Multi-Agent Particle Envs
- For Traffic Junction environments: IC3Net
To run the training experiments:
- Navigate to the directory where argument files are located.
- Edit the corresponding file (e.g., simple_tag_sqddpg.py) to adjust hyperparameters.
- Modify the
train.shscript, updating theEXP_NAMEto your chosen experiment name and configureCUDA_VISIBLE_DEVICESfor GPU usage.
To execute the training, use the following command:
bash
source train.sh
Testing Your Model
To test your trained model, execute:
bash
python test.py --save-model-dir --render --episodes
Understanding Experiment Results: An Analogy
Imagine you’re a coach training a sports team, trying out different strategies to win matches. SQDDPG acts as the coach, analyzing various plays (or strategies) like Cooperative Navigation or Prey-and-Predator, to determine the best way to achieve the common goal of winning. Each applying external conditions, like the Traffic Junction, reflects how well your team adapts to different challenges. The results from each experiment reveal how effective these strategies are, akin to assessing how the team performed in different matches.
Troubleshooting Common Issues
- Dependencies Not Found: If you run into issues regarding missing libraries, make sure your Python version is correct and that you’ve activated your Anaconda environment.
- Training Errors: Double-check the argument files to confirm the hyperparameters are set correctly. If you’re facing GPU allocation errors, ensure the
CUDA_VISIBLE_DEVICESvariable is configured to your available GPU. - Reporting Issues: If problems persist, check the GitHub repository for similar issues or ask questions there for community support.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Extending the Framework
If you’re eager to customize your multi-agent learning further, this framework is easily extensible. You can add new environments from OpenAI Gym or inject your multi-agent algorithms simply by following the predefined structure in the code. Your new methods may be registered in the aux.py file, ensuring smooth integration into the existing framework.
In conclusion, at fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

