How to Get Started with CosyVoice

Aug 21, 2024 | Educational

If you’ve ever wondered how to dive into the world of seamless audio generation with CosyVoice, you’re in the right place! In this blog article, we’ll guide you through the essential steps to install and utilize the CosyVoice model effectively.

1. Cloning the Repository

To start your journey with CosyVoice, first, you’ll need to clone the repository. Here’s how to do it:

git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git

If you encounter issues with submodules due to network failures, keep running the following until you succeed:

cd CosyVoice
git submodule update --init --recursive

2. Setting Up Your Environment

Ensure you have Conda installed. If not, follow the instructions at Conda Installation Guide. Now, let’s create a Conda environment:

conda create -n cosyvoice python=3.8
conda activate cosyvoice

Then, install the required packages:

pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com

3. Fixing SOX Compatibility Issues

In case you run into SOX compatibility issues during installation, here are the commands to resolve them:

  • For Ubuntu: sudo apt-get install sox libsox-dev
  • For CentOS: sudo yum install sox sox-devel

4. Downloading Models

Next, you need to download the recommended pretrained models. You can opt for either of the following methods:

A. Python Script Method

Use this simple script to download the models:

from modelscope import snapshot_download
snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
snapshot_download('iic/speech_kantts_ttsfrd', local_dir='pretrained_models/speech_kantts_ttsfrd')

B. Git Method

If you prefer using Git, ensure Git LFS is installed and execute:

mkdir -p pretrained_models
git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
git clone https://www.modelscope.cn/iic/speech_kantts_ttsfrd.git pretrained_models/speech_kantts_ttsfrd

5. Basic Usage of CosyVoice

To utilize the CosyVoice model, follow these steps:

  1. Set up your PYTHONPATH:
  2. export PYTHONPATH=third_party/AcademiCodec:third_party/Matcha-TTS
  3. Import the necessary libraries and instantiate the CosyVoice class:
  4. from cosyvoice.cli.cosyvoice import CosyVoice
    from cosyvoice.utils.file_utils import load_wav
    import torchaudio
    
    cosyvoice = CosyVoice(speech_tts='CosyVoice-300M-SFT')
  5. Run an inference to generate audio by providing text input:
  6. output = cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女')
    torchaudio.save('sft.wav', output, 22050)

Use similar commands for zero-shot and instruct inference as detailed in the original documentation.

6. Advanced Usage

For those looking for a deeper dive into CosyVoice, we provide advanced training and inference scripts. You can find these in the examples/libritts/cosyvoice/run.sh script.

7. Building for Deployment

If you want to deploy the service using gRPC, follow these commands:

cd runtime
docker build -t cosyvoice:v1.0 .
docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime && python3 server.py --port 50000 --max_conc 4 --model_dir speech_tts/CosyVoice-300M; sleep infinity"

Troubleshooting

If you run into any issues while following this guide, here are some troubleshooting tips:

  • Ensure you have a stable internet connection while cloning repos and downloading models.
  • Check that all dependencies are properly installed in your Conda environment.
  • Verify your PYTHONPATH includes the necessary directories.
  • If you encounter issues during audio generation, double-check your input text and model selection.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

8. Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Ready to bring your audio project to life with CosyVoice? Dive in and start creating!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox