How to Use FunASR: A Comprehensive Guide

Feb 5, 2024 | Educational

FunASR is a powerful end-to-end speech recognition toolkit, designed to facilitate the development and training of speech recognition models in both academic and industrial settings. This guide will walk you through the installation process, provide quick start tutorials, and offer troubleshooting tips to help you get the most out of FunASR.

Features of FunASR

  • Speech Recognition (ASR)
  • Voice Activity Detection (VAD)
  • Punctuation Restoration
  • Language Models
  • Speaker Verification
  • Speaker Diarization
  • Multi-talker ASR

FunASR provides convenient scripts and tutorials that support the inference and fine-tuning of pre-trained models, which makes the entire process smoother and more accessible for researchers and developers alike.

Installation

To install FunASR, you have two options: via pip or by installing from the source code. Here’s how:

pip3 install -U funasr

Or install from the source:

git clone https://github.com/alibaba/FunASR.git
cd FunASR
pip3 install -e .

Optional: Install ModelScope for additional pre-trained models:

pip3 install -U modelscope

Quick Start

Ready to dive into using FunASR? Here’s how to quickly start with some sample commands:

Speech Recognition (Non-Streaming)

from funasr import AutoModel

model = AutoModel(model='paraformer-zh', model_revision='v2.0.4', vad_model='fsmn-vad', vad_model_revision='v2.0.4', punc_model='ct-punc-c', punc_model_revision='v2.0.4')

res = model.generate(input='exampleasr_example.wav', batch_size_s=300, hotword='魔搭')
print(res)

Voice Activity Detection (Streaming)

chunk_size = [0, 10, 5]
model = AutoModel(model='fsmn-vad', model_revision='v2.0.4')

import soundfile
import os

wav_file = os.path.join(model.model_path, 'examplevad_example.wav')
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960
cache = total_chunk_num = int(len(speech) - 1) // chunk_stride + 1

for i in range(total_chunk_num):
    speech_chunk = speech[i * chunk_stride:(i + 1) * chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
    print(res)

Understanding FunASR through Analogy

Think of FunASR as a well-organized kitchen where various chefs (i.e., models) are ready to whip up delicious meals (i.e., speech outputs). Each chef specializes in a different dish:

  • The Speech Recognition chef prepares delectable dishes with timestamps.
  • The Voice Activity chef ensures that only the freshest ingredients (audio segments) are used, filtering out the noise.
  • The Punctuation Restoration chef sprinkles just the right amount of punctuation to make the sentences palatable.

In this kitchen, the chefs work together seamlessly. With FunASR’s easy-to-follow recipes (commands), even novice cooks can create gourmet dishes (speech recognition outputs) that satisfy the palate of various users.

Troubleshooting

If you encounter any issues during installation or while using FunASR, here are some troubleshooting tips:

  • Ensure that all dependencies are correctly installed by running the installation commands again.
  • Check the compatibility of Python and pip versions. FunASR works best with Python 3.6 or newer.
  • Verify that you are using the correct model names and versions in your commands.
  • If the program crashes or hangs, try running it in a virtual environment to avoid conflicts with other packages.
  • For persistent issues, consult the [GitHub Repository](https://github.com/alibaba-damo-academy/FunASR) for updates or common problems reported by other users.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox