Understanding the Asteroid Model: JorisCosConvTasNet_Libri2Mix_sepnoisy_8k

Sep 25, 2021 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_10_434

Have you ever wondered how we can separate audio sources from a mixed signal, such as in a conversation where multiple voices overlap? This blog will guide you through the Asteroid model, particularly the JorisCosConvTasNet_Libri2Mix_sepnoisy_8k, developed for audio separation tasks using the powerful Asteroid framework. Buckle up as we dive into the world of audio processing!

Getting Started with the Asteroid Model

The Asteroid model you’ve come across is engineered to tackle the sep_noisy task using the Libri2Mix dataset. This model was meticulously trained by Joris Cosentino, utilizing the librimix recipe from Asteroid.

Training Configuration

Here’s a concise breakdown of the training configuration that was used:

ymldata:
    n_src: 2
    sample_rate: 8000
    segment: 3
    task: sep_noisy
    train_dir: data/wav8k/min/train-360
    valid_dir: data/wav8k/min/dev
filterbank:
    kernel_size: 16
    n_filters: 512
    stride: 8
masknet:
    bn_chan: 128
    hid_chan: 512
    mask_act: relu
    n_blocks: 8
    n_repeats: 3
    skip_chan: 128
optim:
    lr: 0.001
    optimizer: adam
    weight_decay: 0.0
training:
    batch_size: 24
    early_stop: True
    epochs: 200
    half_lr: True
    num_workers: 4

To simplify this technical configuration, let’s use an analogy. Imagine that you are baking a cake (the audio model) using specific ingredients (the training parameters). Each ingredient plays a role:

n_src corresponds to the two distinct flavors (sounds) you want to combine.
sample_rate reflects the speed at which you are mixing (processing audio).
batch_size is like the number of cakes you are baking at the same time.
epochs serves as the number of times you perfect the recipe over different trials.

Together, these parameters create the ideal conditions for the model to learn and adapt effectively to the task at hand.

Evaluating Performance

The results obtained from the Libri2Mix min test set indicate the model’s performance during training. Here’s a captivating summary of its results:

SI-SDR: 9.944
SI-SDR Improvement: 11.939
SDR: 10.701
SDR Improvement: 12.481
SIR: 22.633
SAR: 11.132
STOI: 0.852

Troubleshooting Common Issues

If you run into issues while utilizing this audio model, here are some troubleshooting tips:

Ensure you have the correct dataset paths set in your training configuration.
Check if your system supports the specific sample rate of 8000 Hz.
Verify that you have enough memory available, especially if your batch size is large.
If facing performance issues, consider reducing the batch size or the number of epochs.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Licensing Details

The ConvTasNet_Libri2Mix_sepnoisy_8k model is a derivative of the LibriSpeech ASR corpus used under CC BY 4.0, and The WSJ0 Hipster Ambient Mixtures dataset by Whisper.ai used under CC BY-NC 4.0 (Research only).

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox