How to Use EmotiVoice: A Multi-Voice and Prompt-Controlled TTS Engine

Feb 12, 2021 | Educational

EmotiVoice is a cutting-edge open-source text-to-speech (TTS) engine that can vocalize your text in both English and Chinese, with an impressive selection of over 2000 voices. What sets it apart is its ability to synthesize various emotional tones in speech, making your applications come alive. Whether you are a developer looking to implement TTS in your project or just someone curious about the technology, this guide will walk you through the setup and usage of EmotiVoice.

Getting Started with EmotiVoice

There are two primary ways to get EmotiVoice running: using a Docker image or by performing a full installation. Below, we’ll explore both approaches.

Using EmotiVoice Docker Image

The easiest way to try EmotiVoice is via Docker, which enables a simple setup without conflicts in your local environment. Here’s how to get started:

  • Make sure you have a machine equipped with an NVidia GPU.
  • Set up the NVidia container toolkit by following the instructions for Linux or Windows WSL2.
  • Run the following command to start EmotiVoice:
  • sh
        docker run -dp 127.0.0.1:8501:8501 syq163/emoti-voice:latest
        
  • Open your web browser and navigate to http://localhost:8501 to access the TTS capabilities.

Performing a Full Installation

If you prefer a more traditional setup, here’s how to perform a full installation:

  • Create a new environment:
  • sh
        conda create -n EmotiVoice python=3.8 -y
        conda activate EmotiVoice
        
  • Install the required dependencies:
  • sh
        pip install torch torchaudio
        pip install numpy numba scipy transformers soundfile yacs g2p_en jieba pypinyin pypinyin_dict
        python -m nltk.downloader averaged_perceptron_tagger_eng
        

Preparing Model Files

To utilize EmotiVoice, it’s crucial to prepare the model files. Refer to the wiki page on downloading the pretrained model files. Here’s a quick look:

  • Install Git Large File Storage (LFS):
  • sh
        git lfs install
        git lfs clone https://huggingface.co/WangZeJun/simbert-base-chinese
        

Generating Speech with EmotiVoice

Once you have EmotiVoice running, you can generate speech by completing the following steps:

  1. Download the pretrained models:
  2. sh
        git clone https://www.modelscope.cn/syq163/outputs.git
        
  3. Prepare your inference text, formatted appropriately for speaker and emotion prompts.
  4. Run the TTS model:
  5. sh
        TEXT=datainferencetext
        python inference_am_vocoder_joint.py --logdir prompt_tts_open_source_joint --config_folder configjoint --checkpoint g_00140000 --test_file $TEXT
        

Troubleshooting Tips

If you run into issues while setting up or using EmotiVoice, here are some common troubleshooting strategies:

  • Ensure that you have the latest Docker version installed.
  • Verify your GPU drivers and NVidia settings if you’re using Docker.
  • If you’re still facing issues, consult the wiki page for solutions.
  • For persistent errors, consider reaching out for community support or review the issues page on GitHub.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

EmotiVoice stands at the forefront of emotional TTS technology, offering flexibility and a multitude of features. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Explore More

If you’re curious about other features and capabilities, don’t hesitate to check out the demo hosted on Replicate: EmotiVoice Demo.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox