The BERT-mini model is a lightweight version of BERT that can efficiently handle tasks like natural language inference. In this article, we will guide you through the process of fine-tuning the BERT-mini model using the state-of-the-art second-order optimizer M-FAC, which has shown promising results on the MNLI dataset.
Understanding M-FAC
M-FAC, short for Matrix-Free Approximations of Second-Order Information, is an optimization approach that can improve model performance by considering curvature in the loss landscape, thus allowing it to converge faster than traditional optimizers like Adam. This is akin to navigating a hilly terrain: while Adam may stick to the main road, M-FAC can explore shortcuts and lesser-known trails to find the fastest route to the destination.
Finetuning Setup
To set up the fine-tuning process, follow these steps:
- First, ensure that you have the necessary libraries and framework installed. Refer to the fine-tuning code provided in the Hugging Face example here.
- Replace the Adam optimizer with M-FAC and configure M-FAC hyperparameters:
learning rate = 1e-4
number of gradients = 1024
dampening = 1e-6
Results
After conducting several runs, the best model produced the following results on the MNLI validation set:
matched_accuracy = 75.13
mismatched_accuracy = 75.93
Here’s how the M-FAC results compare against the default Adam optimizer across five runs:
| Optimizer | Matched Accuracy | Mismatched Accuracy | |———–|—————–|———————| | Adam | 73.30 ± 0.20 | 74.85 ± 0.09 | | M-FAC | 74.59 ± 0.41 | 75.95 ± 0.14 |The results suggest that M-FAC provides a better performance, and we believe further tuning of hyperparameters could yield even better outcomes.
Running the Code
To reproduce the results, include the M-FAC optimizer code in the repository and execute the following command in your terminal:
CUDA_VISIBLE_DEVICES=0 python run_glue.py --seed 8276 --model_name_or_path prajjwal1/bert-mini --task_name mnli --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 32 --learning_rate 1e-4 --num_train_epochs 5 --output_dir out_dir --optim MFAC --optim_args lr: 1e-4, num_grads: 1024, damp: 1e-6
For a fair comparison, the same hyperparameters are maintained across all models to ensure a robust setup.
Troubleshooting Tips
If you encounter issues while fine-tuning, consider the following troubleshooting tips:
- Ensure all required libraries are correctly installed.
- Check the compatibility of the model version with the included code.
- Verify that the optimizer’s parameters are correctly configured.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

