Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

Sep 12, 2023 | Data Science

Are you ready to dive into the intriguing world of multi-agent reinforcement learning? This blog will walk you through the installation, execution, and experimentation of the Shapley Q-value deep deterministic policy gradient (SQDDPG) algorithm, designed to optimize rewards for multiple agents operating in complex environments.

Dependencies You Need to Get Started

Before running the SQDDPG algorithm, make sure you’ve set up the right environment. Here’s what you need:

  • Ubuntu 18.04
  • Python 3.5.4
  • Pytorch 1.0
  • Anaconda 3 for easy dependency management – download it here.
  • OpenAI Gym (version 0.10.5)
  • Numpy (version 1.14.5)
  • TensorFlow (version r1.14) for monitoring training with Tensorboard

Once all dependencies are in place, open your terminal and execute the following commands to set up your environment:

bash
cd SQDDPG/environments/multiagent_particle_envs
pip install -e .

Running Code for Experiments

The experiments for SQDDPG are tailored around two environments: Cooperative Navigation and Prey-and-Predator, with additional scenarios such as Traffic Junction for varied complexity. You can find the environments on GitHub:

To run the training experiments:

  • Navigate to the directory where argument files are located.
  • Edit the corresponding file (e.g., simple_tag_sqddpg.py) to adjust hyperparameters.
  • Modify the train.sh script, updating the EXP_NAME to your chosen experiment name and configure CUDA_VISIBLE_DEVICES for GPU usage.

To execute the training, use the following command:

bash
source train.sh

Testing Your Model

To test your trained model, execute:

bash
python test.py --save-model-dir  --render  --episodes 

Understanding Experiment Results: An Analogy

Imagine you’re a coach training a sports team, trying out different strategies to win matches. SQDDPG acts as the coach, analyzing various plays (or strategies) like Cooperative Navigation or Prey-and-Predator, to determine the best way to achieve the common goal of winning. Each applying external conditions, like the Traffic Junction, reflects how well your team adapts to different challenges. The results from each experiment reveal how effective these strategies are, akin to assessing how the team performed in different matches.

Troubleshooting Common Issues

  • Dependencies Not Found: If you run into issues regarding missing libraries, make sure your Python version is correct and that you’ve activated your Anaconda environment.
  • Training Errors: Double-check the argument files to confirm the hyperparameters are set correctly. If you’re facing GPU allocation errors, ensure the CUDA_VISIBLE_DEVICES variable is configured to your available GPU.
  • Reporting Issues: If problems persist, check the GitHub repository for similar issues or ask questions there for community support.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Extending the Framework

If you’re eager to customize your multi-agent learning further, this framework is easily extensible. You can add new environments from OpenAI Gym or inject your multi-agent algorithms simply by following the predefined structure in the code. Your new methods may be registered in the aux.py file, ensuring smooth integration into the existing framework.

In conclusion, at fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox