How to Use FunASR for Speech Recognition

Feb 2, 2024 | Educational

Welcome to the world of FunASR, a robust toolkit designed for end-to-end speech recognition! Whether you are a researcher or a developer, FunASR bridges the gap between academia and industry, making speech recognition easier and more efficient. In this guide, we’ll walk you through the installation, quick start, and troubleshooting tips to get the most out of FunASR.

Highlights of FunASR

  • Versatile Features: FunASR offers speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, and more.
  • Pre-trained Models: A wide collection of models are available on ModelScope and HuggingFace.
  • User-Friendly Scripts: Convenient scripts and tutorials for inference and fine-tuning are readily available.

Installation

Ready to dive in? Let’s get started with the installation:

pip3 install -U funasr

Alternatively, you can install from the source code:

git clone https://github.com/alibaba/FunASR.git
cd FunASR
pip3 install -e .

To use the pre-trained models, install ModelScope optionally:

pip3 install -U modelscope

Quick Start

Now that we are set up, let’s have some fun with FunASR! Here’s how you can start using it:

1. Speech Recognition (Non-Streaming)

from funasr import AutoModel

model = AutoModel(
    model="paraformer-zh",
    model_revision="v2.0.4",
    vad_model="fsmn-vad",
    vad_model_revision="v2.0.4",
    punc_model="ct-punc-c",
    punc_model_revision="v2.0.4"
)
res = model.generate(input="exampleasr_example.wav", batch_size_s=300, hotword="")
print(res)

2. Voice Activity Detection (Streaming)

chunk_size = [0, 10, 5]
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")
import soundfile
speech, sample_rate = soundfile.read("examplevad_example.wav")

Understanding the Code with an Analogy

Think of the FunASR process as preparing a delicious meal:

  • Ingredients: The audio file is your main ingredient. You need to have the right quality and type for the best results.
  • Recipe: In our code, the models act as recipes. Each recipe has different instructions – some may focus on cooking styles (like ASR, VAD, etc.) while others handle the seasoning (like punctuation restoration).
  • Cooking: When you run your code (or cook), the model processes the audio (ingredients) into a final output (your tasty meal, i.e., transcribed text).

Troubleshooting Tips

While using FunASR, you might encounter some obstacles. Here are a few troubleshooting tips:

  • No output: Ensure your input file is correct and accessible. Check the file path in the code.
  • Model not found errors: Verify that the model names are correctly spelled and that they are available for download.
  • Performance issues: Try adjusting your batch sizes or stream chunk sizes to improve efficiency.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox