How to Get Started with Massively Multilingual Speech (MMS)

Jun 16, 2023 | Educational

In the ever-evolving world of artificial intelligence, transcribing speech across numerous languages poses a challenge. However, with the Massively Multilingual Speech (MMS) project, that challenge transforms into an opportunity! This blog will guide you step-by-step on how to use the MMS project, leveraging its advanced capabilities in automatic speech recognition (ASR) across over 1,000 languages.

Understanding the MMS Model

The MMS project offers a fine-tuned model built on the Wav2Vec2 architecture, which has been developed with a staggering 1 billion parameters. If we think about it, this model is like a knowledgeable library that can interpret spoken languages from different books (or cultures) and transcribe them into text.

Example

Let’s dive right into how you can utilize this incredible model! Below are the steps you need to follow:

1. Install Required Libraries

First, you’ll need to install the necessary libraries:

pip install torch accelerate torchaudio datasets
pip install --upgrade transformers

**Note:** Ensure that the Transformers library is at least version 4.30. If it’s not available on PyPI, install it from the source using:

pip install git+https://github.com/huggingface/transformers.git

2. Load Your Audio Samples

We’ll then proceed to load some audio samples, ensuring they are sampled at 16,000 kHz:

from datasets import load_dataset, Audio

# Load an English sample
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]

# Load a French sample
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]

3. Load the Model and Processor

Now it’s time to load the actual model and processor:

from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch

model_id = "facebook/mms-1b-all"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)

4. Process and Transcribe Audio Data

Here you will process the audio data and transcribe it:

inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)  # Outputs transcription for English

Supported Languages

The MMS model supports transcription for a staggering 1,162 languages. You can check the comprehensive list of supported languages, including their ISO 639-3 codes in the MMS Language Coverage Overview.

Model Details

Developed by: Vineel Pratap et al.
Model type: Multi-Lingual Automatic Speech Recognition model
License: CC-BY-NC 4.0 license
Number of parameters: 1 billion
Audio sampling rate: 16,000 kHz

Additional Links

Blog post
Transformers documentation
Paper
GitHub Repository
Other MMS checkpoints
MMS Base Checkpoints:
- facebook/mms-1b
- facebook/mms-300m
Official Space

Troubleshooting

If you encounter issues while using the MMS model, consider the following troubleshooting ideas:

Ensure that you have met all the installation requirements, especially the correct version of Transformers.
Verify that your audio samples are properly formatted to 16,000 kHz.
Check the model loading and processor initialization to confirm they’re correctly set up.

For additional insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox