Are you ready to dive into the world of audio applications powered by artificial intelligence? If so, Diart is the python framework you need! This framework enables the creation of real-time audio applications that can distinguish between different speakers during conversations— a process known as speaker diarization. Let’s get started!
Quick Overview of Diart
Diart is designed to streamline the development of AI-driven audio applications. Here’s what you can do with it:
- Recognize different speakers during conversations
- Create custom AI pipelines
- Benchmark and tune your models
- Serve applications on the web using WebSockets
The real-time speaker diarization pipeline combines segmentation and embedding models to improve accuracy as conversations unfold, much like a skilled bartender who knows how to mix drinks better as the party goes on!
Installation Steps
To get Diart up and running, follow these steps:
- Ensure you have the following dependencies installed:
- ffmpeg 4.4
- portaudio == 19.6.X
- libsndfile = 1.2.2
- Setting up the environment:
conda env create -f diartenvironment.yml conda activate diart - Install the package:
pip install diart
Streaming Audio
Once installed, you can stream audio in two primary ways— via command line or within Python scripts.
From the Command Line
To stream a recorded audio conversation, run:
diart.stream pathtoaudio.wav
For live audio using your microphone:
diart.stream microphone
From Python
Here’s how to streamline audio sources and write results to disk:
from diart import SpeakerDiarization
from diart.sources import MicrophoneAudioSource
from diart.inference import StreamingInference
from diart.sinks import RTTMWriter
pipeline = SpeakerDiarization()
mic = MicrophoneAudioSource()
inference = StreamingInference(pipeline, mic, do_plot=True)
inference.attach_observers(RTTMWriter(mic.uri, outputfile.rttm))
prediction = inference()
Models
Diart supports multiple models, including pretrained models from Hugging Face. You can use commands to specify segmentation and embedding models directly.
Tuning Hyper-Parameters
Tuning allows you to refine your model’s performance. Here’s how to start tuning from the command line:
diart.tune wavdir --reference rttmdir --output outputdir
Or from Python:
from diart.optim import Optimizer
optimizer = Optimizer(wavdir, rttmdir, outputdir)
optimizer(num_iter=100)
Building Custom Pipelines
Diart gives you the flexibility to create your own custom pipelines. Use the building blocks provided by the framework to design solutions tailored to your needs!
WebSockets
Diart also supports serving pipelines through WebSockets. Here’s how to start a server from the command line:
diart.serve --host 0.0.0.0 --port 7007
Or set up a server with custom parameters via Python:
from diart import SpeakerDiarization
from diart.sources import WebSocketAudioSource
from diart.inference import StreamingInference
pipeline = SpeakerDiarization()
source = WebSocketAudioSource(pipeline.config.sample_rate, localhost, 7007)
inference = StreamingInference(pipeline, source)
inference.attach_hooks(lambda ann_wav: source.send(ann_wav[0].to_rttm()))
prediction = inference()
Troubleshooting
If you run into any issues, consider the following:
- Ensure all dependencies are correctly installed.
- Check your microphone permissions.
- Verify that your audio files are in the correct format.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
Diart is a powerful framework for creating audio applications that leverage AI. Whether you’re looking to build your own pipelines or utilize pre-trained models, the steps outlined above will get you moving in the right direction.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

