How to Perform Audio Source Separation Using SepFormer

Feb 21, 2024 | Educational

Welcome to the world of audio source separation, where we can unravel the complexities of mixed audio signals and bring clarity to sounds. In this article, we will guide you step-by-step through the process of using the SepFormer model from the SpeechBrain toolkit to separate audio sources. Buckle up as we dive into this exciting journey!

What You Need

Python installed on your machine
Basic knowledge of Python programming
Audio files sampled at 8 kHz (single channel)

Step 1: Installing SpeechBrain

First things first, you need to install the SpeechBrain library. This can be done effortlessly with a single command in your terminal:

pip install speechbrain

We encourage you to check out some tutorials to familiarize yourself with SpeechBrain.

Step 2: Performing Source Separation

Now that SpeechBrain is installed, you can start separating audio sources! Here’s how:

from speechbrain.inference.separation import SepformerSeparation as separator
import torchaudio

model = separator.from_hparams(source="speechbrain/sepformer-wsj03mix", savedir="pretrained_models/sepformer-wsj03mix")
est_sources = model.separate_file(path="speechbrain/sepformer-wsj03mix/test_mixture_3spks.wav")

torchaudio.save("source1_hat.wav", est_sources[:, :, 0].detach().cpu(), 8000)
torchaudio.save("source2_hat.wav", est_sources[:, :, 1].detach().cpu(), 8000)
torchaudio.save("source3_hat.wav", est_sources[:, :, 2].detach().cpu(), 8000)

In this small snippet of code, we can draw an analogy: imagine you are a chef using a specialized kitchen tool (the SepFormer model) to separate different ingredients (audio sources) from a mixed dish (audio file). Just as the chef can distinguish each ingredient to prepare a dish to perfection, the SepFormer can separate the mixed audio tracks into distinct outputs.

Step 3: Inferring on GPU (Optional)

If you want to harness the power of GPU for faster performance, simply add the option to run on GPU:

run_opts="device:cuda"

Step 4: Training Your Own Model

If you are keen on training the model from scratch, follow these steps:

Clone SpeechBrain:

git clone https://github.com/speechbrain/speechbrain

Navigate into the SpeechBrain directory:

cd speechbrain

Install dependencies:

pip install -r requirements.txt
pip install -e .

Run the training script:

cd recipes/WSJ0Mix/separation
python train.py hparams/sepformer.yaml --data_folder=your_data_folder

Take note to change the number of speakers (num_spks) in the YAML file to 3!

Troubleshooting

If you encounter any issues while setting up or running the code, consider the following troubleshooting tips:

Check that your audio files are in the correct format and sample rate (8 kHz).
Ensure all dependencies are properly installed.
Refer to the SpeechBrain tutorials for common errors and solutions.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With the help of the SepFormer model and SpeechBrain toolkit, audio source separation is no longer a complex endeavor. You can now effortlessly enjoy clear, distinct audio outputs. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox