In the music world, sometimes you might want to enjoy the instrumental versions of songs without the vocals or isolate the vocals for karaoke purposes. The Vocal-Remover is a deep-learning-based tool that enables you to effortlessly extract instrumental tracks from your favorite songs. In this blog post, we’ll guide you through the installation and usage of this innovative tool, along with some troubleshooting tips.
Installation
Follow these steps to get Vocal-Remover up and running on your system:
Download Vocal-Remover
First, download the latest version from here.
Install PyTorch
For instructions on installing PyTorch, check out the GET STARTED guide.
Install the Other Packages
Navigate to the Vocal-Remover directory and install the required packages:
cd vocal-remover
pip install -r requirements.txt
Usage
Now that you have everything set up, let’s dive into how you can use Vocal-Remover to separate audio tracks.
Basic Command
The following command separates the input audio into instrumental and vocal tracks. Both tracks are saved as *_Instruments.wav and *_Vocals.wav.
# Run on CPU
python inference.py --input path_to_an_audio_file
# Run on GPU
python inference.py --input path_to_an_audio_file --gpu 0
Advanced Options
The Vocal-Remover also provides advanced options to improve separation quality. Here’s how you can use them:
- Test-Time Augmentation: The –tta option performs Test-Time-Augmentation.
python inference.py --input path_to_an_audio_file --tta --gpu 0
python inference.py --input path_to_an_audio_file --postprocess --gpu 0
Warning: The post-processing feature is experimental. If you encounter issues, consider disabling it.
Training Your Own Model
If you’re feeling adventurous and want to train your own model, follow these steps:
Place your dataset in the following structure:
path_to_dataset
├─ instruments
│ ├─ 01_foo_inst.wav
│ ├─ 02_bar_inst.mp3
│ └─ ...
└─ mixtures
├─ 01_foo_mix.wav
├─ 02_bar_mix.mp3
└─ ...
Then, you can train your model with this command:
python train.py --dataset path_to_dataset --mixup_rate 0.5 --reduction_rate 0.5 --gpu 0
Troubleshooting
If you run into any issues during installation or usage, consider the following troubleshooting steps:
- Ensure you’re using a compatible version of PyTorch.
- Double-check the path to the audio file; errors in the path can lead to failed executions.
- If using GPU, ensure that your drivers and CUDA are correctly set up.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
References
- [1] Jansson et al., Singing Voice Separation with Deep U-Net Convolutional Networks, Read Document
- [2] Takahashi et al., Multi-scale Multi-band DenseNets for Audio Source Separation, Read Document
- [3] Takahashi et al., MMDENSELSTM: AN EFFICIENT COMBINATION OF CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS FOR AUDIO SOURCE SEPARATION, Read Document
- [4] Choi et al., PHASE-AWARE SPEECH ENHANCEMENT WITH DEEP COMPLEX U-NET, Read Document
- [5] Jansson et al., Learned complex masks for multi-instrument source separation, Read Document
- [6] Liutkus et al., The 2016 Signal Separation Evaluation Campaign, Read Document
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

