Getting Started with the Whisper-Tiny Model in Unity Sentis

Apr 14, 2024 | Educational

Welcome to the exciting world of speech-to-text conversion using the Whisper-Tiny model with Unity Sentis! In this guide, we’ll navigate through the process of integrating this powerful tool into your Unity 2023 project. Let’s dive in!

Prerequisites

Unity 2023 installed on your machine.
Basic understanding of how to navigate Unity’s interface.
A 16kHz mono audio file ready for transcription.

How to Use the Whisper-Tiny Model

Follow these steps to get started with the Whisper-Tiny model:

Open a new scene in Unity 2023.
Import the package com.unity.sentis version 1.4.0-pre.3 from the package manager.
Attach the RunWhisper.cs script to the Main Camera in your scene.
Place your *.sentis files and vocab.json in the Assets/StreamingAssets folder.
Add your 16kHz mono audio file (up to 30 seconds long) to your project and drag it onto the audioClip field in the Inspector.
IMPORTANT: Ensure your audio is 16kHz. In the audio inspector, select Force Mono and enable Decompress on Load.
Optionally, if you have a 44kHz or 22kHz audio file, you can convert it to 16kHz using the conversion model.

When you hit the play button, the transcription of your audio will be displayed in the console window!

Understanding the Language Tokens

The Whisper-Tiny model processes audio input with the help of four special tokens, which you can set for customizing the output:

One token identifies the input language.
Another token determines whether the transcription is directly in the specified language or translated into English.

Details of these tokens can be found in the Whisper Tiny model documentation, where you’ll also find descriptions of the added_tokens.json file.

Troubleshooting Tips

If you encounter issues while using the Whisper-Tiny model, consider the following troubleshooting steps:

Ensure that your audio file is indeed in 16kHz and mono format.
Double-check that all necessary files (like *.sentis and vocab.json) are correctly placed in the Assets/StreamingAssets directory.
If the transcription does not appear, confirm that RunWhisper.cs is correctly attached to the Main Camera.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

By following this guide, you’ll harness the power of the Whisper-Tiny model to convert audio to text seamlessly. Remember, this process can be likened to a translator at work: it listens to your audio and painstakingly crafts the best transcription, ensuring nothing gets lost in translation!

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox