FunASR is a powerful end-to-end speech recognition toolkit, designed to facilitate the development and training of speech recognition models in both academic and industrial settings. This guide will walk you through the installation process, provide quick start tutorials, and offer troubleshooting tips to help you get the most out of FunASR.
Features of FunASR
- Speech Recognition (ASR)
- Voice Activity Detection (VAD)
- Punctuation Restoration
- Language Models
- Speaker Verification
- Speaker Diarization
- Multi-talker ASR
FunASR provides convenient scripts and tutorials that support the inference and fine-tuning of pre-trained models, which makes the entire process smoother and more accessible for researchers and developers alike.
Installation
To install FunASR, you have two options: via pip or by installing from the source code. Here’s how:
pip3 install -U funasr
Or install from the source:
git clone https://github.com/alibaba/FunASR.git
cd FunASR
pip3 install -e .
Optional: Install ModelScope for additional pre-trained models:
pip3 install -U modelscope
Quick Start
Ready to dive into using FunASR? Here’s how to quickly start with some sample commands:
Speech Recognition (Non-Streaming)
from funasr import AutoModel
model = AutoModel(model='paraformer-zh', model_revision='v2.0.4', vad_model='fsmn-vad', vad_model_revision='v2.0.4', punc_model='ct-punc-c', punc_model_revision='v2.0.4')
res = model.generate(input='exampleasr_example.wav', batch_size_s=300, hotword='魔搭')
print(res)
Voice Activity Detection (Streaming)
chunk_size = [0, 10, 5]
model = AutoModel(model='fsmn-vad', model_revision='v2.0.4')
import soundfile
import os
wav_file = os.path.join(model.model_path, 'examplevad_example.wav')
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960
cache = total_chunk_num = int(len(speech) - 1) // chunk_stride + 1
for i in range(total_chunk_num):
speech_chunk = speech[i * chunk_stride:(i + 1) * chunk_stride]
is_final = i == total_chunk_num - 1
res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
print(res)
Understanding FunASR through Analogy
Think of FunASR as a well-organized kitchen where various chefs (i.e., models) are ready to whip up delicious meals (i.e., speech outputs). Each chef specializes in a different dish:
- The Speech Recognition chef prepares delectable dishes with timestamps.
- The Voice Activity chef ensures that only the freshest ingredients (audio segments) are used, filtering out the noise.
- The Punctuation Restoration chef sprinkles just the right amount of punctuation to make the sentences palatable.
In this kitchen, the chefs work together seamlessly. With FunASR’s easy-to-follow recipes (commands), even novice cooks can create gourmet dishes (speech recognition outputs) that satisfy the palate of various users.
Troubleshooting
If you encounter any issues during installation or while using FunASR, here are some troubleshooting tips:
- Ensure that all dependencies are correctly installed by running the installation commands again.
- Check the compatibility of Python and pip versions. FunASR works best with Python 3.6 or newer.
- Verify that you are using the correct model names and versions in your commands.
- If the program crashes or hangs, try running it in a virtual environment to avoid conflicts with other packages.
- For persistent issues, consult the [GitHub Repository](https://github.com/alibaba-damo-academy/FunASR) for updates or common problems reported by other users.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.