How to Implement Federated Learning with PyTorch

Category :

In the age of decentralized data, Federated Learning provides a path to train models effectively without the need to share raw data. It draws inspiration from the paper Communication-Efficient Learning of Deep Networks from Decentralized Data. By the end of this article, you’ll have a clear guide on how to set up, run experiments on datasets such as MNIST, Fashion MNIST, and CIFAR10, and troubleshoot any issues that arise.

Requirements

  • Python3
  • Pytorch
  • Torchvision

Ensure that you have all the necessary packages installed as per requirements.txt.

Data Handling

You can download the train and test datasets manually, or they will be automatically downloaded from the torchvision datasets. The experiments will be conducted on the following datasets:

  • MNIST
  • Fashion MNIST
  • CIFAR10

If you want to use your own dataset:

  • Place your dataset in the data directory.
  • Create a wrapper class around the PyTorch dataset class.

Running the Experiments

The implementation consists of both baseline and federated experiments.

Baseline Experiment

This trains the model using the conventional method. To run the baseline experiment with MNIST on MLP using CPU, execute:

python src/baseline_main.py --model=mlp --dataset=mnist --epochs=10

Or to run it on GPU (assuming gpu:0 is available):

python src/baseline_main.py --model=mlp --dataset=mnist --gpu=0 --epochs=10

Federated Experiment

This experiment involves training a global model from various local models. To run the federated experiment with CIFAR on CNN (IID), simply use:

python src/federated_main.py --model=cnn --dataset=cifar --gpu=0 --iid=1 --epochs=10

To run the same experiment under non-IID conditions, use:

python src/federated_main.py --model=cnn --dataset=cifar --gpu=0 --iid=0 --epochs=10

You can also tweak other parameters by reviewing options.py for default values.

Options for Configuration

The parameters you can adjust include:

  • --dataset (default: mnist; options: mnist, fmnist, cifar)
  • --model (default: mlp; options: mlp, cnn)
  • --gpu (default: None, runs on CPU)
  • --epochs (number of rounds of training)
  • --lr (learning rate, default: 0.01)
  • --verbose (detailed log outputs, set to 0 to deactivate)
  • --seed (default: 1, random seed)
  • --iid (distribution of data; default IID)
  • --num_users (default: 100)
  • --frac (fraction of users for updates, default: 0.1)
  • --local_ep (local training epochs, default: 10)
  • --local_bs (batch size of local updates, default: 10)
  • --unequal (to split data equally or unequally; default: equal)

Results Overview

Baseline Experiment Results

The outcome after training for 10 epochs can be summarized as follows:

  • MLP: 92.71%
  • CNN: 98.42%

Federated Experiment Results

The federated setup yields results based on models and data distribution:

Model Iid Non-IID (equal)
MLP 88.38% 73.49%
CNN 97.28% 75.94%

Troubleshooting

If you encounter issues while running your federated learning experiments, here are a few suggestions:

  • Make sure all packages are installed correctly as stated in the requirements.txt file.
  • Double-check dataset paths and ensure datasets are correctly placed in the data directory.
  • Verify your GPU availability if you’re trying to run the model on a GPU.
  • If errors persist, try executing the program with --verbose=1 to get detailed logs of what might be going wrong.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

Latest Insights

© 2024 All Rights Reserved

×