EmotiVoice is a cutting-edge open-source text-to-speech (TTS) engine that can vocalize your text in both English and Chinese, with an impressive selection of over 2000 voices. What sets it apart is its ability to synthesize various emotional tones in speech, making your applications come alive. Whether you are a developer looking to implement TTS in your project or just someone curious about the technology, this guide will walk you through the setup and usage of EmotiVoice.
Getting Started with EmotiVoice
There are two primary ways to get EmotiVoice running: using a Docker image or by performing a full installation. Below, we’ll explore both approaches.
Using EmotiVoice Docker Image
The easiest way to try EmotiVoice is via Docker, which enables a simple setup without conflicts in your local environment. Here’s how to get started:
- Make sure you have a machine equipped with an NVidia GPU.
- Set up the NVidia container toolkit by following the instructions for Linux or Windows WSL2.
- Run the following command to start EmotiVoice:
sh
docker run -dp 127.0.0.1:8501:8501 syq163/emoti-voice:latest
Performing a Full Installation
If you prefer a more traditional setup, here’s how to perform a full installation:
- Create a new environment:
sh
conda create -n EmotiVoice python=3.8 -y
conda activate EmotiVoice
sh
pip install torch torchaudio
pip install numpy numba scipy transformers soundfile yacs g2p_en jieba pypinyin pypinyin_dict
python -m nltk.downloader averaged_perceptron_tagger_eng
Preparing Model Files
To utilize EmotiVoice, it’s crucial to prepare the model files. Refer to the wiki page on downloading the pretrained model files. Here’s a quick look:
- Install Git Large File Storage (LFS):
sh
git lfs install
git lfs clone https://huggingface.co/WangZeJun/simbert-base-chinese
Generating Speech with EmotiVoice
Once you have EmotiVoice running, you can generate speech by completing the following steps:
- Download the pretrained models:
- Prepare your inference text, formatted appropriately for speaker and emotion prompts.
- Run the TTS model:
sh
git clone https://www.modelscope.cn/syq163/outputs.git
sh
TEXT=datainferencetext
python inference_am_vocoder_joint.py --logdir prompt_tts_open_source_joint --config_folder configjoint --checkpoint g_00140000 --test_file $TEXT
Troubleshooting Tips
If you run into issues while setting up or using EmotiVoice, here are some common troubleshooting strategies:
- Ensure that you have the latest Docker version installed.
- Verify your GPU drivers and NVidia settings if you’re using Docker.
- If you’re still facing issues, consult the wiki page for solutions.
- For persistent errors, consider reaching out for community support or review the issues page on GitHub.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
EmotiVoice stands at the forefront of emotional TTS technology, offering flexibility and a multitude of features. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Explore More
If you’re curious about other features and capabilities, don’t hesitate to check out the demo hosted on Replicate: EmotiVoice Demo.