If you’ve ever wondered how to dive into the world of seamless audio generation with CosyVoice, you’re in the right place! In this blog article, we’ll guide you through the essential steps to install and utilize the CosyVoice model effectively.
1. Cloning the Repository
To start your journey with CosyVoice, first, you’ll need to clone the repository. Here’s how to do it:
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
If you encounter issues with submodules due to network failures, keep running the following until you succeed:
cd CosyVoice
git submodule update --init --recursive
2. Setting Up Your Environment
Ensure you have Conda installed. If not, follow the instructions at Conda Installation Guide. Now, let’s create a Conda environment:
conda create -n cosyvoice python=3.8
conda activate cosyvoice
Then, install the required packages:
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com
3. Fixing SOX Compatibility Issues
In case you run into SOX compatibility issues during installation, here are the commands to resolve them:
- For Ubuntu:
sudo apt-get install sox libsox-dev - For CentOS:
sudo yum install sox sox-devel
4. Downloading Models
Next, you need to download the recommended pretrained models. You can opt for either of the following methods:
A. Python Script Method
Use this simple script to download the models:
from modelscope import snapshot_download
snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
snapshot_download('iic/speech_kantts_ttsfrd', local_dir='pretrained_models/speech_kantts_ttsfrd')
B. Git Method
If you prefer using Git, ensure Git LFS is installed and execute:
mkdir -p pretrained_models
git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
git clone https://www.modelscope.cn/iic/speech_kantts_ttsfrd.git pretrained_models/speech_kantts_ttsfrd
5. Basic Usage of CosyVoice
To utilize the CosyVoice model, follow these steps:
- Set up your PYTHONPATH:
- Import the necessary libraries and instantiate the CosyVoice class:
- Run an inference to generate audio by providing text input:
export PYTHONPATH=third_party/AcademiCodec:third_party/Matcha-TTS
from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio
cosyvoice = CosyVoice(speech_tts='CosyVoice-300M-SFT')
output = cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女')
torchaudio.save('sft.wav', output, 22050)
Use similar commands for zero-shot and instruct inference as detailed in the original documentation.
6. Advanced Usage
For those looking for a deeper dive into CosyVoice, we provide advanced training and inference scripts. You can find these in the examples/libritts/cosyvoice/run.sh script.
7. Building for Deployment
If you want to deploy the service using gRPC, follow these commands:
cd runtime
docker build -t cosyvoice:v1.0 .
docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime && python3 server.py --port 50000 --max_conc 4 --model_dir speech_tts/CosyVoice-300M; sleep infinity"
Troubleshooting
If you run into any issues while following this guide, here are some troubleshooting tips:
- Ensure you have a stable internet connection while cloning repos and downloading models.
- Check that all dependencies are properly installed in your Conda environment.
- Verify your PYTHONPATH includes the necessary directories.
- If you encounter issues during audio generation, double-check your input text and model selection.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
8. Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Ready to bring your audio project to life with CosyVoice? Dive in and start creating!

