How to Use the Norwegian NB-Whisper Large Model for Automatic Speech Recognition

Feb 15, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_16_181

Welcome to the world of cutting-edge speech recognition technology! In this guide, we’ll explore how to utilize the Norwegian NB-Whisper Large model, a robust tooling developed by the National Library of Norway. With this model, you can transcribe audio files into meaningful text with impressive accuracy. So, let’s dive in and discover how to harness this technology for your audio transcription needs!

What is the NB-Whisper Large Model?

The Norwegian NB-Whisper Large model is an automatic speech recognition (ASR) model trained on a diverse dataset, encompassing 66,000 hours of speech. It’s designed to recognize and transcribe Norwegian, Norwegian Bokmål, Norwegian Nynorsk, and English audio files. Think of it as a translator for sounds; just like a translator converts a spoken language into written text, this model takes voice audio and transforms it into written words!

Setting Up the Model

Before you can start transcribing, you’ll need to set up either an online demo or run the model locally. Here’s how you can do both:

Using Online Demos

Head over to the HuggingFace Inference API. You’ll find the models there.
Please note that the model may initially take some time to load and perform operations under limited CPU capacity. If you’re looking for speed, some models are temporarily hosted on TPUs, enhancing performance.

Running Locally with HuggingFace

To utilize the model locally on your machine:

Ensure you have Python installed: Download it from python.org.
Download the sample audio file:

wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3

Install the required libraries:

pip install transformers==4.35.2

Running the Model

With your environment ready, here’s how to transcribe audio:

from transformers import pipeline

# Load the model
asr = pipeline("automatic-speech-recognition", model="NbAiLabBetanb-whisper-large-verbatim")

# Transcribe audio
result = asr("king.mp3", generate_kwargs={"task": "transcribe", "language": "no"}) 
print(result)

Improving Transcription Quality

Transcribing longer audio files might yield lower results. Here are a few tips to enhance the accuracy:

Set chunk_length_s argument to 28 seconds to better handle long audio files.
Adjust the beam size to 5 for increased accuracy, although it may consume more memory and take longer to process.

Troubleshooting Common Issues

While engaging with the NB-Whisper model, users may encounter a few common issues. Here are some troubleshooting ideas:

Slow Performance: If the model is running slowly, consider switching to a TPU for better performance, as described in the online demos section.
Errors in Transcription: Ensure your audio file is in a supported format (like WAV for C++ implementation) and not too long.
Using Multiple Languages: When transcribing using different languages, make sure to specify the language parameter correctly.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

In summary, the NB-Whisper Large model is a powerful tool for anyone interested in automatic speech recognition, offering high accuracy, support for multiple languages, and easy integration in both online and local environments. Remember to experiment with the model, and don’t hesitate to reach out to the community or check back here for updates.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox