How to Utilize Hubert-Base for Keyword Spotting

Nov 7, 2021 | Educational

If you are looking to incorporate advanced audio classification in your projects, you’ve come to the right place. Hubert-Base comes to the rescue by enhancing keyword spotting capabilities, allowing you to detect specific keywords in audio input efficiently. Below, I’ll guide you through its use with user-friendly instructions and troubleshooting tips.

Model Description

HuBERT (Hidden Unit BERT) is tailored for the SUPERB Keyword Spotting task. Think of it like a highly-trained librarian (the model) who can only respond to specific callouts (keywords). Each keyword corresponds to a book on the shelf, allowing the librarian to provide you with the right information immediately. This model is based on hubert-base-ls960, which has been pre-trained on 16kHz sampled speech audio. Thus, when using this model, ensure your speech input is sampled at the same frequency.

Understanding Keyword Spotting

Keyword Spotting (KS) functions like a premium concierge service, identifying phrases from an extensive catalog and discerning between utterances effectively. With the Speech Commands dataset—acting as the dictionary—the model can classify words, silence, and even false positives. As it’s designed for on-device processing, it ensures rapid response times, making accuracy, model size, and inference time critical factors to consider.

Getting Started with Hubert-Base

Installation

Before running the model, ensure you have the required libraries installed:

pip install datasets transformers torch torchaudio

Usage Examples

You can implement the model in a couple of ways. Let’s explore both:

Method 1: Using the Audio Classification Pipeline

With the following Python code, load a dataset and classify audio inputs:


from datasets import load_dataset
from transformers import pipeline

dataset = load_dataset("anton-l/superb_demo", "ks", split="test")
classifier = pipeline("audio-classification", model="superb/hubert-base-superb-ks")
labels = classifier(dataset[0]["file"], top_k=5)

Method 2: Direct Model Use

For more control, you may opt to use the model directly:


import torch
from datasets import load_dataset
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
from torchaudio.sox_effects import apply_effects_file

effects = [[“channels”, 1], [“rate”, 16000], [“gain”, -3.0]]

def map_to_array(example):
    speech, _ = apply_effects_file(example["file"], effects)
    example["speech"] = speech.squeeze(0).numpy()
    return example

dataset = load_dataset("anton-l/superb_demo", "ks", split="test")
dataset = dataset.map(map_to_array)

model = HubertForSequenceClassification.from_pretrained("superb/hubert-base-superb-ks")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/hubert-base-superb-ks")

inputs = feature_extractor(dataset[:4]["speech"], sampling_rate=16000, padding=True, return_tensors="pt")
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
labels = [model.config.id2label[_id] for _id in predicted_ids.tolist()]

Evaluation Results

The evaluation metric for assessing the accuracy of the models shows promising results:

S3PRL: 0.9630
Transformers: 0.9672

Troubleshooting

If you encounter issues while working with Hubert-Base, consider the following steps:

Ensure that your audio data is sampled at 16kHz, as this is a requirement for the model’s effectiveness.
Check if all necessary libraries are correctly installed and updated.
Confirm the paths to your audio files are accurate.
For any unknown errors, inspect the console for tracebacks that might guide you to the problem.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox