Are you ready to unlock the full potential of Automatic Speech Recognition (ASR) with the power of OpenAI’s Whisper model? In this guide, we will walk you through the steps to fine-tune Whisper for specific judicial contexts in the Portuguese language. Get ready to dive into the world of ASR, where your audio inputs will transform into transcribed text seamlessly!
Prerequisites: Setting Up Your Environment
Before we embark on this journey, it’s essential to have the required libraries in place. You can set them up with a series of simple commands:
!pip install transformers
!pip install einops accelerate bitsandbytes
!pip install sentence_transformers
!pip install git+https://github.com/huggingface/peft.git
Loading and Configuring the Model
Now that we have all dependencies installed, let’s load and configure our Whisper model for fine-tuning. Think of this step as preparing the canvas before painting; the cleaner and more detailed your setup is, the better the results will be.
from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, BitsAndBytesConfig
# Step 1: Define your task and language
task = "transcribe"
language = "portuguese"
# Step 2: Configure the model
nf8_config = BitsAndBytesConfig(load_in_8bit=True)
peft_model_id = "rhaymison/legal-whisper-portuguese-peft"
peft_config = PeftConfig.from_pretrained(peft_model_id)
# Step 3: Load the model
model = WhisperForConditionalGeneration.from_pretrained(
peft_config.base_model_name_or_path,
quantization_config=nf8_config,
device_map="auto"
)
model = PeftModel.from_pretrained(model, peft_model_id)
Explaining the Code: An Analogy
Let’s break down the above code with an analogy. Imagine you’re a chef (the model) preparing a gourmet dish (transcribing audio). First, you gather your ingredients (dependencies) and get your kitchen organized (loading and configuring the model). You need to set the right ambiance (task and language) before you can begin cooking (transcribing audio). As you carefully select each ingredient per the recipe outlined (the various configuration steps), you ensure your dish will be a culinary masterpiece!
Loading the Processor and Preparing Audio
Once your model is set up, it’s time to configure the processor to handle input audio as well as prepare your audio file for processing. This step is essential to ensure your model can ‘understand’ the audio coming in.
from transformers import WhisperProcessor
# Load the audio processor
processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)
# Converting our audio to a sample rate of 16000 and saving
from pydub import AudioSegment
audio = AudioSegment.from_wav("content/audio.wav")
audio = audio.set_frame_rate(16000)
audio.export("z.wav", format="wav")
Creating the Pipeline
The next crucial step is to create a pipeline for automatic speech recognition. You can think of this as setting up the machinery in a factory; each piece must function together to produce the desired output efficiently.
import torch
from transformers import pipeline
# Check if a GPU is available
device = "cuda:0" if torch.cuda.is_available() else "cpu"
# Define the pipeline
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch.float16
)
Performing Transcription
Finally, it’s time for the moment of truth! You can now transcribe your audio content into text. This is where all your efforts come together as the model listens and understands the audio it receives.
transcription = pipe("content/z.wav", generate_kwargs={"language": "portuguese"})
print(transcription)
Troubleshooting Tips
If you encounter any issues while following these steps, here are some troubleshooting ideas:
- Ensure all libraries are correctly installed. Missing dependencies can lead to import errors.
- Check that your audio file path is correct and the audio format is supported.
- If the model performs poorly, try switching between 4-bit and 8-bit configurations and see which one yields better results.
- Monitor your GPU and CPU usage; heavy processing could slow down your system.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
