How to Perform Audio Source Separation Using SepFormer

Feb 21, 2024 | Educational

In the field of audio processing, separating multiple sources from a mixed audio signal can create clearer, more usable recordings. With the aid of the SepFormer model from SpeechBrain, implementing audio source separation has never been easier. This guide will walk you through the steps to set up and utilize SepFormer, providing you with all the information you need to succeed.

Understanding the Basics

Imagine you’re in a bustling café filled with conversations and clinking cups. You want to listen to your friend’s voice while tuning out the noise around you. This is essentially what audio source separation does: it allows us to isolate specific sounds from a mix of many.

The SepFormer model uses advanced techniques to distinguish different audio sources in a similar manner that your brain can filter out background sounds when engaged in conversation. So grab your headphones and let’s dive into the process of using SepFormer with SpeechBrain!

Installation of SpeechBrain

To get started, you will need to install the SpeechBrain library. Open your terminal and execute the following command:

pip install speechbrain

For a better understanding of how to use this library and explore its functionalities further, feel free to check out the SpeechBrain website.

Performing Source Separation

Now that you have SpeechBrain installed, it’s time to perform source separation on your audio files. Use the following Python script as a guide:

from speechbrain.inference.separation import SepformerSeparation as separator
import torchaudio

model = separator.from_hparams(source="speechbrain/sepformer-whamr", savedir="pretrained_models/sepformer-whamr")

# Change the path for your own custom audio file
est_sources = model.separate_file(path="speechbrain/sepformer-wsj02mix/test_mixture.wav")

torchaudio.save("source1_hat.wav", est_sources[:, :, 0].detach().cpu(), 8000)
torchaudio.save("source2_hat.wav", est_sources[:, :, 1].detach().cpu(), 8000)

This script will separate audio sources and save them as individual files named “source1_hat.wav” and “source2_hat.wav”. Ensure your input audio is sampled at 8kHz (single channel). If it is not, you’ll need to resample your audio before proceeding.

Running on GPU

If you want to speed up the process and have access to a GPU, simply add run_opts=device:cuda when invoking the model:

model = separator.from_hparams(source="speechbrain/sepformer-whamr", savedir="pretrained_models/sepformer-whamr", run_opts="device:cuda")

Training Your Own Model

If you’re interested in training your own model from scratch, follow these steps:

  • Clone the SpeechBrain repository using: git clone https://github.com/speechbrain/speechbrain
  • Install the necessary dependencies:
  • cd speechbrain
    pip install -r requirements.txt
    pip install -e .
  • Run the training script with your data:
  • cd recipes/WHAM/and/WHAMR/separation
    python train.py hparams/sepformer-whamr.yaml --data_folder=YOUR_DATA_FOLDER --rir_path=YOUR_ROOM_IMPULSE_SAVE_PATH

For training results and additional resources, you can check out the training results.

Troubleshooting Common Issues

If you encounter any issues, consider the following troubleshooting tips:

  • Ensure that the input audio file is in the correct format and sampled at 8kHz.
  • Check if the SpeechBrain library is correctly installed, and that you have all necessary dependencies.
  • When running on GPU, confirm that your CUDA environment is correctly configured.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Utilizing SepFormer for audio source separation can significantly enhance your audio processing capabilities. By following this guide, you can perform source separation efficiently. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox