In the age of decentralized data, Federated Learning provides a path to train models effectively without the need to share raw data. It draws inspiration from the paper Communication-Efficient Learning of Deep Networks from Decentralized Data. By the end of this article, you’ll have a clear guide on how to set up, run experiments on datasets such as MNIST, Fashion MNIST, and CIFAR10, and troubleshoot any issues that arise.
Requirements
- Python3
- Pytorch
- Torchvision
Ensure that you have all the necessary packages installed as per requirements.txt
.
Data Handling
You can download the train and test datasets manually, or they will be automatically downloaded from the torchvision datasets. The experiments will be conducted on the following datasets:
- MNIST
- Fashion MNIST
- CIFAR10
If you want to use your own dataset:
- Place your dataset in the
data
directory. - Create a wrapper class around the PyTorch dataset class.
Running the Experiments
The implementation consists of both baseline and federated experiments.
Baseline Experiment
This trains the model using the conventional method. To run the baseline experiment with MNIST on MLP using CPU, execute:
python src/baseline_main.py --model=mlp --dataset=mnist --epochs=10
Or to run it on GPU (assuming gpu:0
is available):
python src/baseline_main.py --model=mlp --dataset=mnist --gpu=0 --epochs=10
Federated Experiment
This experiment involves training a global model from various local models. To run the federated experiment with CIFAR on CNN (IID), simply use:
python src/federated_main.py --model=cnn --dataset=cifar --gpu=0 --iid=1 --epochs=10
To run the same experiment under non-IID conditions, use:
python src/federated_main.py --model=cnn --dataset=cifar --gpu=0 --iid=0 --epochs=10
You can also tweak other parameters by reviewing options.py
for default values.
Options for Configuration
The parameters you can adjust include:
--dataset
(default: mnist; options: mnist, fmnist, cifar)--model
(default: mlp; options: mlp, cnn)--gpu
(default: None, runs on CPU)--epochs
(number of rounds of training)--lr
(learning rate, default: 0.01)--verbose
(detailed log outputs, set to 0 to deactivate)--seed
(default: 1, random seed)--iid
(distribution of data; default IID)--num_users
(default: 100)--frac
(fraction of users for updates, default: 0.1)--local_ep
(local training epochs, default: 10)--local_bs
(batch size of local updates, default: 10)--unequal
(to split data equally or unequally; default: equal)
Results Overview
Baseline Experiment Results
The outcome after training for 10 epochs can be summarized as follows:
- MLP: 92.71%
- CNN: 98.42%
Federated Experiment Results
The federated setup yields results based on models and data distribution:
Model | Iid | Non-IID (equal) |
---|---|---|
MLP | 88.38% | 73.49% |
CNN | 97.28% | 75.94% |
Troubleshooting
If you encounter issues while running your federated learning experiments, here are a few suggestions:
- Make sure all packages are installed correctly as stated in the
requirements.txt
file. - Double-check dataset paths and ensure datasets are correctly placed in the
data
directory. - Verify your GPU availability if you’re trying to run the model on a GPU.
- If errors persist, try executing the program with
--verbose=1
to get detailed logs of what might be going wrong.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.