Harnessing the Power of Portuguese BERT for Question Answering

Jan 6, 2022 | Educational

Welcome to the world of natural language processing (NLP) where artificial intelligence meets human language! Today, we will explore how to utilize the Portuguese BERT base cased model, fine-tuned on the SQUAD v1.1 dataset, for effective question answering.

Introduction

This remarkable model is designed to respond to questions based on a given context. Developed using the Deep Learning Brasil group and powered by BERTimbau Base, it showcases state-of-the-art performance in understanding and processing Brazilian Portuguese.

Understanding the Code: A Postman Analogy

Imagine you are a post office worker (our program) who needs to deliver a letter (the answer) to the right address (the question). In order to achieve this efficiently, you require:

A mail carrier (the BERT model) to deliver the letters between destinations.
Address labels (the tokenizer) to understand where each letter is going.
A well-organized warehouse (the dataset) filled with accurate information to retrieve needed letters (answers) from.

Now, let’s break down the code step by step:

from transformers import pipeline

context = r"A pandemia de COVID-19, também conhecida como pandemia de coronavírus..."
model_name = "pierreguillou/bert-base-cased-squad-v1.1-portuguese"
nlp = pipeline("question-answering", model=model_name)

question = "Quando começou a pandemia de Covid-19 no mundo?"
result = nlp(question=question, context=context)
print(f"Answer: {result['answer']}, score: {round(result['score'], 4)}")

Here, the code initiates the model, sets up a context (like our warehouse filled with information), and allows you to ask questions, retrieving precise answers from the context.

How to Use the Model

To utilize this model, you have two options:

Using the Pipeline: This allows for a straightforward setup to get responses based on questions and context.
Using Auto Classes: More control over the tokenizer and model, suitable for advanced users wanting to customize.

Using the Pipeline

from transformers import pipeline

model_name = "pierreguillou/bert-base-cased-squad-v1.1-portuguese"
nlp = pipeline("question-answering", model=model_name)
question = "Onde foi descoberta a Covid-19?"
context = "A pandemia de COVID-19 envolve muitos detalhes..."
result = nlp(question=question, context=context)
print(f"Answer: {result['answer']}, score: {round(result['score'], 4)}")

Using the Auto Classes

from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("pierreguillou/bert-base-cased-squad-v1.1-portuguese")
model = AutoModelForQuestionAnswering.from_pretrained("pierreguillou/bert-base-cased-squad-v1.1-portuguese")

Performance Metrics

The performance of the model is impressive, with metrics like:

F1 Score: 82.50
Exact Match: 70.49

Troubleshooting Tips

If you run into issues, here are a few troubleshooting ideas:

Ensure that your environment has the required libraries installed, such as transformers.
Check for typos in the model name and dataset paths.
Review the context and questions to make sure they are formatted correctly.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox