How to Fine-Tune an Automatic Speech Recognition Model using Mozilla’s Common Voice Datasets

Mar 27, 2022 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_13_1140

In the rapidly evolving field of artificial intelligence, Automatic Speech Recognition (ASR) has become a game-changer, enabling machines to transcribe and understand human speech. This guide takes you through the intriguing process of fine-tuning an ASR model using Mozilla’s Common Voice datasets. Let’s take a closer look!

Understanding the Model

We are focusing on a fine-tuned version of the facebook/wav2vec2-xls-r-300m model, tailored specifically for the MOZILLA-FOUNDATIONCOMMON_VOICE_8_0 – ET dataset. This model has been trained on multiple datasets including the Common Voice 7 and 8.0, with evaluation metrics such as Word Error Rate (WER) and Character Error Rate (CER) displaying its efficacy.

Key Metrics Achieved

Common Voice 7
- Test WER: 0.342
- Test CER: 0.073
Common Voice 8.0
- Test WER: 34.18
Robust Speech Event – Dev Data
- Test WER: 45.53
Robust Speech Event – Test Data
- Test WER: 54.41

The Training Process Explained

Think of the training process as preparing for a marathon. You wouldn’t just go out and run the 26 miles—first, you need a training plan, gradual build-up, proper nutrition, rest, and tracking your progress. Similarly, our ASR model undergoes structured training with hyperparameters tailored for optimal performance.

Training Hyperparameters

Learning Rate: 0.0003
Train Batch Size: 72
Eval Batch Size: 72
Seed: 42
Total Train Batch Size: 144
Optimizer: Adam with parameters: betas=(0.9,0.999) and epsilon=1e-08
Number of Epochs: 100

Results Overview

Here’s the progress we’ve seen during the training phases mapped to the learning epochs:

Epoch  | Training Loss | Validation Loss | WER
12.5   |      0.3082  |      0.3871    |  0.4907
25.0   |      0.1497  |      0.4168    |  0.4278
37.5   |      0.1243  |      0.4446    |  0.4220
50.0   |      0.0954  |      0.4426    |  0.3946
62.5   |      0.0741  |      0.4502    |  0.3800
75.0   |      0.0533  |      0.4618    |  0.3653
87.5   |      0.0447  |      0.4518    |  0.3461
100.0  |      0.0396  |      0.4623    |  0.3420

Troubleshooting Tips

If you encounter issues during the training process or while running the model, here are some troubleshooting ideas:

Low Performance on Specific Datasets: Check that the dataset you are using is appropriate and that your training hyperparameters align with it. Tighten the learning rate if necessary.
Inconsistent Results: Ensure reproducibility by setting a random seed. This helps in minimizing variability in results across runs.
Memory Errors: If you run into memory issues while training, consider reducing the batch size.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox