How to Utilize the Massively Multilingual Speech (MMS) Checkpoint with ASR

Aug 16, 2023 | Educational

If you’ve ever wanted to transcribe audio in multiple languages seamlessly, the Massively Multilingual Speech (MMS) checkpoint is your answer. With its impressive capabilities, this model supports over 100 languages, allowing you to break down language barriers like never before. In this guide, we will walk you through the steps of using the MMS model for Automatic Speech Recognition (ASR), followed by troubleshooting tips, so you can get started confidently.

Example
Supported Languages
Model details
Additional links

Example

Let’s dive straight into a practical example. The steps are straightforward, almost like preparing a classy meal with all the right ingredients. To utilize the MMS model, we need to prepare our environment first.

1. Install Required Libraries

Just like a master chef gathers ingredients, we too need to gather our libraries. Run these commands in your terminal:

pip install torch accelerate torchaudio datasets
pip install --upgrade transformers

Note: Ensure you have at least version 4.30 of the transformers library. If it isn’t available, you can install it from the source:

pip install git+https://github.com/huggingface/transformers.git

2. Load Audio Samples

Now, let’s load some audio samples using the datasets library. Remember, your audio data should be sampled at 16,000 kHz. Here’s how to do it:

from datasets import load_dataset, Audio

# Load English audio stream
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]

# Load French audio stream
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]

3. Load the Model and Processor

Just like a recipe has specific tools, we need to load our model and processor:

from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch

model_id = "facebook/mms-1b-fl102"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)

4. Process and Transcribe the Audio Data

The next steps involve processing our audio for transcription, similar to how we would cook our ingredients to perfection:

inputs = processor(en_sample, sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)  # Output transcription for English

# For French
processor.tokenizer.set_target_lang("fra")
model.load_adapter("fra")
inputs = processor(fr_sample, sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

ids = torch.argmax(outputs, dim=-1)[0]
transcription_fr = processor.decode(ids)  # Output transcription for French

Now you can transcribe audio from multiple languages by switching language adapters!

Supported Languages

The MMS model supports an astonishing 102 languages! For a comprehensive list of supported languages and their ISO 639-3 codes, please refer to the MMS Language Coverage Overview.

Model Details

Developed by: Vineel Pratap et al.
Model Type: Multi-Lingual Automatic Speech Recognition model
Number of Parameters: 1 billion
Audio Sampling Rate: 16,000 kHz
License: CC-BY-NC 4.0

Additional Links

Troubleshooting Tips

If you run into any hiccups while using the MMS model, here are a few troubleshooting ideas:

Ensure all required libraries are installed and up to date.
Check if your audio files are correctly sampled to 16,000 kHz.
For a version mismatch issue, ensure you are using transformers version 4.30 or newer.
Review all model dependencies listed in the documentation for required versions.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox