How to Use the Massively Multilingual Speech (MMS) Fine-tuned ASR Model

Jun 19, 2023 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_0_105

The Massively Multilingual Speech (MMS) project by Facebook is paving the way for speech recognition in over 1000 languages. This guide provides an overview of how to set up and use the fine-tuned Automatic Speech Recognition (ASR) model efficiently.

Example
Supported Languages
Model Details
Additional Links

Example

Using the MMS checkpoint with Transformers allows us to transcribe audio in 1107 different languages. Let’s break this down with a simple analogy. Imagine you have a Swiss Army knife – it has multiple tools to address different tasks, from slicing through paper to unscrewing a bolt. Similarly, the MMS model is like a Swiss Army knife for language; it can tackle audio transcription in many different languages, using different adapters for each language as tools to help it achieve its specific task.

Here’s how to get started:

pip install torch accelerate torchaudio datasets
pip install --upgrade transformers

Note: Ensure you have at least version 4.30 of the Transformers library installed. If it’s not available on PyPI, consider installing the package from source:

pip install git+https://github.com/huggingface/transformers.git

Next, we load a couple of audio samples through the datasets. Ensure that the audio data is sampled at 16000 kHz.

from datasets import load_dataset, Audio

# English
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]

# French
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]

Now, let’s load the model and processor:

from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch

model_id = "facebook/mms-1b-all"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)

Next, we can process the audio data and use the model to transcribe the audio into readable text:

inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits
    ids = torch.argmax(outputs, dim=-1)[0]
    transcription = processor.decode(ids)

# Example transcript
# "Joe Keton disapproved of films, and Buster also had reservations about the media."

To switch the language, simply call the load_adapter() function to change the language adapter for the model:

processor.tokenizer.set_target_lang("fra")
model.load_adapter("fra")
inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits
    ids = torch.argmax(outputs, dim=-1)[0]
    transcription = processor.decode(ids)

# Transcription in French
# "Ce dernier est volé tout au long de l'histoire romaine."

Supported Languages

The MMS model supports an extensive list of 1162 languages! You can find the complete list of supported languages in their ISO 639-3 code format and additional details regarding language codes in the MMS Language Coverage Overview.

Model Details

Developed by: Vineel Pratap et al.
Model Type: Multi-Lingual Automatic Speech Recognition model
Languages: Supported languages see Supported Languages.
License: CC-BY-NC 4.0
Number of Parameters: 1 billion
Audio Sampling Rate: 16,000 kHz

Additional Links

Troubleshooting

Here are some common troubleshooting ideas if you run into issues:

Ensure you have installed compatible versions of the libraries.
If the model isn’t transcribing correctly, check the audio sampling rate and ensure it’s consistently set at 16000 kHz.
Permutations in input formats may lead to unexpected results; double-check your input audio formats.
Look for potential errors in loading datasets and make sure they are accessible.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox