Using the Whisper Small Western Frisian (Netherlands) Model for Automatic Speech Recognition

Sep 1, 2023 | Educational

The Whisper Small Western Frisian model offers a powerful tool for automatic speech recognition (ASR). It has been fine-tuned using the Mozilla Foundation’s Common Voice dataset specific to Western Frisian (fy-NL). This blog post will guide you through understanding and using this model effectively.

Model Overview

This ASR model is a specialized version of the openai/whisper-small, refined to interpret and transcribe Western Frisian speech accurately. The model has been evaluated and showcases the following metrics:

  • Loss: 0.5703
  • Word Error Rate (WER): 21.8466

How to Fine-tune and Evaluate the Whisper Model

Utilizing the Whisper model effectively requires understanding the training process. Below, we’ll break down the training parameters and evaluation strategies needed to optimize its performance, using a car dealership analogy for clarity.

Imagine you’re running a car dealership. You want to train your staff (the model) to recognize various car models and features (understand speech) accurately. Just like you would conduct training sessions and evaluations to assess their performance, you would prepare your model in a similar fashion.

Training Procedure

The following parameters outline how you would “train your staff” (model) to ensure it performs well:

  • Learning Rate: 1e-05 – This is akin to how quickly your staff learns; a slower, steady approach often yields better results.
  • Batch Size: 64 (train) and 32 (eval) – Think of this as the number of trainees attending a morning class vs. those participating in assessments; balance is key.
  • Optimizer: Adam – Similar to having a mentor aiding your staff while they practice.
  • Training Steps: 5000 – The amount of practice time you allocate is crucial for improvement.
  • Mixed Precision Training: Native AMP – Just like utilizing both low and high-resolution methods in training can enhance efficiency, this optimizes computational resources.

Evaluation Metrics

During the training, evaluations were performed at specified steps to gauge performance:

  • At 1000 steps: Validation Loss 0.5184, WER 23.0973
  • At 2000 steps: Validation Loss 0.5653, WER 22.5434
  • At 3000 steps: Validation Loss 0.5703, WER 21.8466
  • At 4000 steps: Validation Loss 0.5968, WER 21.9574
  • At 5000 steps: Validation Loss 0.6044, WER 22.0360

Troubleshooting Tips

While using the Whisper model, you might encounter issues. Here are some troubleshooting tips:

  • If you’re experiencing high WER, consider adjusting the learning rate or re-evaluating your training data quality.
  • Inconsistent loss values could indicate that your batch size needs adjustment; experiment with smaller batches for stability.
  • Ensure your environment is set up correctly with the frameworks: Transformers version 4.26.0, PyTorch version 1.13.0, Datasets version 2.7.1, and Tokenizers version 0.13.2.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox