How to Fine-tune XLSR Wav2Vec2 for Speech Recognition in Basque

Mar 29, 2021 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_20_1166

Welcome to the world of automatic speech recognition where the XLSR Wav2Vec2 model takes center stage! In this guide, we will walk you through the steps to fine-tune the Wav2Vec2 model for understanding Basque language inputs using the Common Voice dataset. Buckle up, as we unravel the process with ease!

Getting Started

Before we dive into the implementation, ensure that you have the necessary libraries installed:

You can install these libraries using pip:

pip install torch torchaudio transformers datasets

Loading the Model and Dataset

We start by loading the required dataset and the pre-trained Wav2Vec2 model. Think of this process as laying out your ingredients before baking a cake. The better organized you are, the smoother the baking process will be.

python
import torch
import torchaudio
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

# Loading the dataset
test_dataset = load_dataset('common_voice', 'eu', split='test[:2%]')
processor = Wav2Vec2Processor.from_pretrained('stefan-it/wav2vec2-large-xlsr-53-basque')
model = Wav2Vec2ForCTC.from_pretrained('stefan-it/wav2vec2-large-xlsr-53-basque')

Preprocessing the Audio Data

In this step, we convert audio files into an array format. If we compare our audio data to a beautiful painting, preprocessing is akin to framing it just right so everyone can appreciate its beauty. Below is the code:

python
resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Function to preprocess audio files
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = torchaudio.load(batch['path'])
    batch['speech'] = resampler(speech_array).squeeze().numpy()
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)

Making Predictions

With the preprocessing done, it’s time for the model to make predictions. Think of the model as a chef following a recipe; it uses the inputs to create something delightful. Here’s how to implement it:

python
inputs = processor(test_dataset['speech'][:2], sampling_rate=16_000, return_tensors='pt', padding=True)

with torch.no_grad():
    logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
    predicted_ids = torch.argmax(logits, dim=-1)

print("Prediction:", processor.batch_decode(predicted_ids))
print("Reference:", test_dataset['sentence'][:2])

Evaluating the Model

To ensure our model works well, we evaluate it using Word Error Rate (WER). This is like a final taste test to confirm that the cake is just right!

python
from datasets import load_metric

wer = load_metric('wer')
result = test_dataset.map(evaluate, batched=True, batch_size=8)
print("WER: {:.2f}%".format(100 * wer.compute(predictions=result['pred_strings'], references=result['sentence'])))

Troubleshooting Tips

If you encounter any issues along the way, here are some troubleshooting ideas:

Library Compatibility: Ensure that your libraries are up-to-date. Sometimes old versions can cause unexpected bugs.
Audio Quality: Make sure the input speech is clear and sampled at 16kHz.
CUDA Issues: If you’re using GPU, ensure that your CUDA installation is compatible with your PyTorch version.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Touches

That’s it! You have now fine-tuned a Wav2Vec2 model for speech recognition in Basque. Each step taken was crucial, much like each ingredient in our cake. By combining them thoughtfully, you’ve created a model that can better understand Basque speech.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox