How to Use Vocal-Remover: Your Ultimate Deep Learning Tool for Audio Separation

Sep 29, 2023 | Data Science

In the music world, sometimes you might want to enjoy the instrumental versions of songs without the vocals or isolate the vocals for karaoke purposes. The Vocal-Remover is a deep-learning-based tool that enables you to effortlessly extract instrumental tracks from your favorite songs. In this blog post, we’ll guide you through the installation and usage of this innovative tool, along with some troubleshooting tips.

Installation

Follow these steps to get Vocal-Remover up and running on your system:

Download Vocal-Remover

First, download the latest version from here.

Install PyTorch

For instructions on installing PyTorch, check out the GET STARTED guide.

Install the Other Packages

Navigate to the Vocal-Remover directory and install the required packages:

cd vocal-remover
pip install -r requirements.txt

Usage

Now that you have everything set up, let’s dive into how you can use Vocal-Remover to separate audio tracks.

Basic Command

The following command separates the input audio into instrumental and vocal tracks. Both tracks are saved as *_Instruments.wav and *_Vocals.wav.

# Run on CPU
python inference.py --input path_to_an_audio_file

# Run on GPU
python inference.py --input path_to_an_audio_file --gpu 0

Advanced Options

The Vocal-Remover also provides advanced options to improve separation quality. Here’s how you can use them:

  • Test-Time Augmentation: The –tta option performs Test-Time-Augmentation.
  • python inference.py --input path_to_an_audio_file --tta --gpu 0
  • Post-processing: The –postprocess option helps mask the instrumental part based on the vocal’s volume.
  • python inference.py --input path_to_an_audio_file --postprocess --gpu 0

Warning: The post-processing feature is experimental. If you encounter issues, consider disabling it.

Training Your Own Model

If you’re feeling adventurous and want to train your own model, follow these steps:

Place your dataset in the following structure:

path_to_dataset
  ├─ instruments
  │  ├─ 01_foo_inst.wav
  │  ├─ 02_bar_inst.mp3
  │  └─ ...
  └─ mixtures
     ├─ 01_foo_mix.wav
     ├─ 02_bar_mix.mp3
     └─ ...

Then, you can train your model with this command:

python train.py --dataset path_to_dataset --mixup_rate 0.5 --reduction_rate 0.5 --gpu 0

Troubleshooting

If you run into any issues during installation or usage, consider the following troubleshooting steps:

  • Ensure you’re using a compatible version of PyTorch.
  • Double-check the path to the audio file; errors in the path can lead to failed executions.
  • If using GPU, ensure that your drivers and CUDA are correctly set up.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

References

  • [1] Jansson et al., Singing Voice Separation with Deep U-Net Convolutional Networks, Read Document
  • [2] Takahashi et al., Multi-scale Multi-band DenseNets for Audio Source Separation, Read Document
  • [3] Takahashi et al., MMDENSELSTM: AN EFFICIENT COMBINATION OF CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS FOR AUDIO SOURCE SEPARATION, Read Document
  • [4] Choi et al., PHASE-AWARE SPEECH ENHANCEMENT WITH DEEP COMPLEX U-NET, Read Document
  • [5] Jansson et al., Learned complex masks for multi-instrument source separation, Read Document
  • [6] Liutkus et al., The 2016 Signal Separation Evaluation Campaign, Read Document

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox