How to Fine-Tune the Whisper Small Km – Kak Soky Model for Automatic Speech Recognition

Nov 20, 2022 | Educational

In the world of artificial intelligence, fine-tuning models allows developers to create application-specific solutions. Today, we will guide you through the steps to fine-tune the Whisper Small Km – Kak Soky model, a finely-tweaked version of openai/whisper-small designed for automatic speech recognition (ASR) using the SLR42 dataset.

Understanding the Whisper Small Km – Kak Soky Model

This model is tailor-made for specific domains, particularly aimed at speech recognition in tourism contexts. Configured using encoder-decoder architectures based on transformers, it leverages a small footprint while maintaining competitive performance metrics.

Results Overview

  • Loss: 0.1471
  • Word Error Rate (WER): 35.6654%

Training and Evaluation Details

To fine-tune this model, we utilize both training and evaluation datasets, which were split in a 90:10 ratio from the Google Text-to-Speech corpus. The intended uses and limitations highlight that the model’s training data is restricted, which constrains its performance to reading speech and a limited domain.

Training Procedure & Hyperparameters

Here’s a rundown of the hyperparameters used during training:

- learning_rate: 1e-05  
- train_batch_size: 2  
- eval_batch_size: 4  
- seed: 42  
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08  
- lr_scheduler_type: linear  
- lr_scheduler_warmup_steps: 500  
- training_steps: 4000  
- mixed_precision_training: Native AMP

Breaking Down the Training Process

Imagine you are training for a marathon. At first, your body feels heavy and you run slowly; that’s similar to how the model starts training, with a higher loss and WER. As you keep running (training), you begin to optimize your pace (model parameters), eventually reaching a state where you perform at your best. A model’s training results reflect this journey as the epoch and steps progress, reducing loss and improving WER over time.

Training Results

Here’s a snapshot of the training results during different steps:

Training Loss  Epoch  Step  Validation Loss  Wer  
:-------------::-----::----::---------------::-------:  
0.3639         0.76   1000  0.3452           71.9392  
0.1553         1.53   2000  0.2025           49.0494  
0.0565         2.29   3000  0.1664           39.9240  
0.1471         3.06   4000  0.1471           35.6654

Troubleshooting Common Issues

If you encounter any bumps along your fine-tuning journey, here are some troubleshooting tips:

  • Verify that your dataset is correctly formatted and matches the SLR42 specifications.
  • Pay attention to the learning rate; if the model isn’t converging, try adjusting it slightly.
  • Ensure that you have enough computational resources and check if mixed precision training is correctly implemented for performance gains.
  • If results seem off, revisit your data split ratio to ensure both train and validation sets are sufficiently diverse.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Fine-tuning models like Whisper Small Km – Kak Soky is significant in customizing AI for specific use cases like tourism speech recognition. Understanding the underlying processes and having the right configurations can pave the way for enhanced performance in your applications.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox