Massively Multilingual Speech (MMS) – Finetuned Language Identification

Jun 13, 2023 | Educational

Welcome to the fascinating world of Multilingual Speech Processing! In this guide, we’ll show you how to utilize the Massively Multilingual Speech (MMS) model for speech language identification (LID). This powerful tool allows you to identify spoken languages from audio input across an impressive range of 256 languages. Let’s jump in!

Getting Started: Finishing Touches

The MMS checkpoint is a model fine-tuned for LID based on Facebook’s Massive Multilingual Speech project. This incredible tool classifies raw audio inputs into a probability distribution over 256 output classes, each representing a unique language. Now, let’s get our hands dirty!

Example Usage

The MMS checkpoint can be conveniently employed using Transformers. Here’s a step-by-step guide to get you started:

pip install torch accelerate torchaudio datasets

pip install --upgrade transformers

Note: Ensure you have the latest version of transformers (at least version 4.30) installed. If this version is not available, you may need to install from the source:

pip install git+https://github.com/huggingface/transformers.git

Loading Audio Samples

Next, we need to load our audio data. The audio data should be sampled at 16,000 kHz. Let’s go through the code snippets below:

from datasets import load_dataset, Audio

# Loading English audio sample
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]

# Loading Arabic audio sample
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "ar", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
ar_sample = next(iter(stream_data))["audio"]["array"]

Model Loading

After getting our audio samples, we swiftly load the model and processor:

from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch

model_id = "facebook/mms-lid-256"
processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id)

Processing Audio Data

We can now preprocess our audio inputs. The following code shows how to classify the languages for the English and Arabic samples:

# English
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs).logits
lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]

# Arabic
inputs = processor(ar_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs).logits
lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]

Seeing the Supported Languages

To view all the supported languages, just print out the language IDs as follows:

processor.id2label.values()

Supported Languages

This model accommodates an astonishing array of 256 languages. You can refer to the detailed list of languages supported by checking out the ISO 639-3 codes. To see the full list of languages along with their codes, you can consult the MMS Language Coverage Overview.

Model Details

Developed by: Vineel Pratap et al.
Model type: Multi-Lingual Automatic Speech Recognition
Language(s): 256 languages, see supported languages
License: CC-BY-NC 4.0 license
Num parameters: 1 billion
Audio sampling rate: 16,000 kHz

Additional Links

Troubleshooting

While embarking on your journey with the MMS model, you might encounter a few bumps along the way. Here are some troubleshooting tips:

If you run into installation issues, double-check the versions of your libraries.
Ensure that your audio clips are properly formatted and sampled at 16,000 kHz.
If the model fails to recognize languages, try using a different audio sample or language.
For persistent issues or for more assistance, consider exploring the community forums or checking GitHub for raised issues related to the model.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox