How to Build AI-Powered Real-Time Audio Applications with Diart

Jul 9, 2022 | Data Science

Are you ready to dive into the world of audio applications powered by artificial intelligence? If so, Diart is the python framework you need! This framework enables the creation of real-time audio applications that can distinguish between different speakers during conversations— a process known as speaker diarization. Let’s get started!

Quick Overview of Diart

Diart is designed to streamline the development of AI-driven audio applications. Here’s what you can do with it:

  • Recognize different speakers during conversations
  • Create custom AI pipelines
  • Benchmark and tune your models
  • Serve applications on the web using WebSockets

The real-time speaker diarization pipeline combines segmentation and embedding models to improve accuracy as conversations unfold, much like a skilled bartender who knows how to mix drinks better as the party goes on!

Installation Steps

To get Diart up and running, follow these steps:

  1. Ensure you have the following dependencies installed:
    • ffmpeg 4.4
    • portaudio == 19.6.X
    • libsndfile = 1.2.2
  2. Setting up the environment:
    conda env create -f diartenvironment.yml
    conda activate diart
  3. Install the package:
    pip install diart

Streaming Audio

Once installed, you can stream audio in two primary ways— via command line or within Python scripts.

From the Command Line

To stream a recorded audio conversation, run:

diart.stream pathtoaudio.wav

For live audio using your microphone:

diart.stream microphone

From Python

Here’s how to streamline audio sources and write results to disk:

from diart import SpeakerDiarization
from diart.sources import MicrophoneAudioSource
from diart.inference import StreamingInference
from diart.sinks import RTTMWriter

pipeline = SpeakerDiarization()
mic = MicrophoneAudioSource()
inference = StreamingInference(pipeline, mic, do_plot=True)
inference.attach_observers(RTTMWriter(mic.uri, outputfile.rttm))
prediction = inference()

Models

Diart supports multiple models, including pretrained models from Hugging Face. You can use commands to specify segmentation and embedding models directly.

Tuning Hyper-Parameters

Tuning allows you to refine your model’s performance. Here’s how to start tuning from the command line:

diart.tune wavdir --reference rttmdir --output outputdir

Or from Python:

from diart.optim import Optimizer

optimizer = Optimizer(wavdir, rttmdir, outputdir)
optimizer(num_iter=100)

Building Custom Pipelines

Diart gives you the flexibility to create your own custom pipelines. Use the building blocks provided by the framework to design solutions tailored to your needs!

WebSockets

Diart also supports serving pipelines through WebSockets. Here’s how to start a server from the command line:

diart.serve --host 0.0.0.0 --port 7007

Or set up a server with custom parameters via Python:

from diart import SpeakerDiarization
from diart.sources import WebSocketAudioSource
from diart.inference import StreamingInference

pipeline = SpeakerDiarization()
source = WebSocketAudioSource(pipeline.config.sample_rate, localhost, 7007)
inference = StreamingInference(pipeline, source)
inference.attach_hooks(lambda ann_wav: source.send(ann_wav[0].to_rttm()))
prediction = inference()

Troubleshooting

If you run into any issues, consider the following:

  • Ensure all dependencies are correctly installed.
  • Check your microphone permissions.
  • Verify that your audio files are in the correct format.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Diart is a powerful framework for creating audio applications that leverage AI. Whether you’re looking to build your own pipelines or utilize pre-trained models, the steps outlined above will get you moving in the right direction.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox