How to Perform Audio Source Separation Using SepFormer and SpeechBrain

Feb 23, 2024 | Educational

In the realm of audio processing, separating individual sound sources from a mixed audio file can seem like an intricate puzzle. But fear not! With the power of SepFormer, implemented in the SpeechBrain toolkit, you can effortlessly unravel these audio enigmas. This guide will walk you through the setup and usage of SepFormer trained on the WHAMR! dataset. Let’s dive in!

What is SepFormer?

SepFormer is a state-of-the-art model designed for audio source separation. It is particularly effective at isolating sounds from environments filled with noise and reverberation, making it a great choice for practical applications in speech processing and more.

Getting Started

Prerequisites

  • Python installed on your system.
  • Audio files sampled at 16kHz for optimal performance.

Installation Steps

To install the SpeechBrain framework, follow these simple steps:

pip install speechbrain

To ensure you can effectively use SepFormer, it is highly recommended to read through the tutorials available on SpeechBrain.

Performing Source Separation

Now that you have SpeechBrain installed, it’s time to perform audio source separation on your own audio files. Below is the process:

from speechbrain.inference.separation 
import SepformerSeparation as separator
import torchaudio

model = separator.from_hparams(source="speechbrain/sepformer-whamr16k", 
savedir="pretrained_models/sepformer-whamr16k")

# Change path to your custom file
est_sources = model.separate_file(path="speechbrain/sepformer-whamr16k/test_mixture16k.wav")

torchaudio.save("source1_hat.wav", est_sources[:, :, 0].detach().cpu(), 16000)
torchaudio.save("source2_hat.wav", est_sources[:, :, 1].detach().cpu(), 16000)

Understanding the Code: An Analogy

Imagine you are a diligent librarian in charge of a massive library filled with books mixed together. Your task is to separate two distinct genres—fiction and non-fiction—into their respective sections. In this analogy:

  • The separator represents you, the librarian, equipped with the tools to separate the genres.
  • The model is your ability to identify which books belong where, having been trained through experience.
  • The separate_file function is the action you take to pull out the books from the mixed pile and sort them into two separate stacks—just like isolating audio sources!

Inference on GPU

If you wish to speed up the process, you can perform inference on a GPU by adding the following option:

run_opts="device:cuda"

Training from Scratch

If you would like to train the model yourself, here’s how to do it:

  1. Clone the SpeechBrain repository:
  2. git clone https://github.com/speechbrain/speechbrain
  3. Navigate into the directory and install requirements:
  4. cd speechbrain
    pip install -r requirements.txt
    pip install -e .
  5. Run the training script:
  6. cd recipes/WHAM_and_WHAMR/separation
    python train.py hparams/sepformer-whamr.yaml --data_folder=your_data_folder --sample_rate=16000

Troubleshooting Tips

When working with audio source separation, you may encounter some hiccups along the way. Here are a few troubleshooting ideas:

  • If you experience issues with your audio quality, ensure your input files are sampled at 16kHz.
  • If your model fails to load, check that the path to your saved model is correct.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With SepFormer and SpeechBrain, you have the tools to effectively separate audio sources from mixed files, paving the way for cleaner audio processing tasks. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox