Welcome to the world of natural language processing (NLP) where artificial intelligence meets human language! Today, we will explore how to utilize the Portuguese BERT base cased model, fine-tuned on the SQUAD v1.1 dataset, for effective question answering.
Introduction
This remarkable model is designed to respond to questions based on a given context. Developed using the Deep Learning Brasil group and powered by BERTimbau Base, it showcases state-of-the-art performance in understanding and processing Brazilian Portuguese.
Understanding the Code: A Postman Analogy
Imagine you are a post office worker (our program) who needs to deliver a letter (the answer) to the right address (the question). In order to achieve this efficiently, you require:
- A mail carrier (the BERT model) to deliver the letters between destinations.
- Address labels (the tokenizer) to understand where each letter is going.
- A well-organized warehouse (the dataset) filled with accurate information to retrieve needed letters (answers) from.
Now, let’s break down the code step by step:
from transformers import pipeline
context = r"A pandemia de COVID-19, também conhecida como pandemia de coronavírus..."
model_name = "pierreguillou/bert-base-cased-squad-v1.1-portuguese"
nlp = pipeline("question-answering", model=model_name)
question = "Quando começou a pandemia de Covid-19 no mundo?"
result = nlp(question=question, context=context)
print(f"Answer: {result['answer']}, score: {round(result['score'], 4)}")
Here, the code initiates the model, sets up a context (like our warehouse filled with information), and allows you to ask questions, retrieving precise answers from the context.
How to Use the Model
To utilize this model, you have two options:
- Using the Pipeline: This allows for a straightforward setup to get responses based on questions and context.
- Using Auto Classes: More control over the tokenizer and model, suitable for advanced users wanting to customize.
Using the Pipeline
from transformers import pipeline
model_name = "pierreguillou/bert-base-cased-squad-v1.1-portuguese"
nlp = pipeline("question-answering", model=model_name)
question = "Onde foi descoberta a Covid-19?"
context = "A pandemia de COVID-19 envolve muitos detalhes..."
result = nlp(question=question, context=context)
print(f"Answer: {result['answer']}, score: {round(result['score'], 4)}")
Using the Auto Classes
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("pierreguillou/bert-base-cased-squad-v1.1-portuguese")
model = AutoModelForQuestionAnswering.from_pretrained("pierreguillou/bert-base-cased-squad-v1.1-portuguese")
Performance Metrics
The performance of the model is impressive, with metrics like:
- F1 Score: 82.50
- Exact Match: 70.49
Troubleshooting Tips
If you run into issues, here are a few troubleshooting ideas:
- Ensure that your environment has the required libraries installed, such as
transformers. - Check for typos in the model name and dataset paths.
- Review the context and questions to make sure they are formatted correctly.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

