BERTimbau Base is a pretrained BERT model tailored for Brazilian Portuguese, achieving remarkable performance on crucial NLP tasks: Named Entity Recognition, Sentence Textual Similarity, and Recognizing Textual Entailment. This model comes in two sizes: Base and Large, allowing flexibility based on your project’s requirements.
Available Models
BERTimbau Base offers two versions you can leverage for your applications:
- neuralmindbert-base-portuguese-cased: BERT-Base, 12 layers, 110M parameters
- neuralmindbert-large-portuguese-cased: BERT-Large, 24 layers, 335M parameters
How to Use BERTimbau Base
Utilizing BERTimbau is straightforward! Follow these steps carefully to get started:
1. Loading the Model
To load the model and tokenizer, use the following code:
from transformers import AutoTokenizer
from transformers import AutoModelForPreTraining
model = AutoModelForPreTraining.from_pretrained("neuralmind/bert-base-portuguese-cased")
tokenizer = AutoTokenizer.from_pretrained("neuralmind/bert-base-portuguese-cased", do_lower_case=False)
2. Performing Masked Language Modeling
Next, you can utilize BERTimbau to predict masked words in a sentence:
from transformers import pipeline
pipe = pipeline("fill-mask", model=model, tokenizer=tokenizer)
results = pipe("Tinha uma [MASK] no meio do caminho.")
This code will analyze your input sentence to find potential words that fit in the masked position!
3. Extracting BERT Embeddings
To gain insights into BERT embeddings, consider the following code:
import torch
input_ids = tokenizer.encode("Tinha uma pedra no meio do caminho.", return_tensors="pt")
with torch.no_grad():
outs = model(input_ids)
encoded = outs[0][0, 1:-1] # Ignore [CLS] and [SEP] special tokens
This will output a tensor that represents the word embeddings while omitting special tokens, providing a richer understanding of the sentence structure.
Troubleshooting Tips
As you work with BERTimbau Base, you may encounter some common issues. Here are a few troubleshooting tips:
- Model Not Found: Ensure you are using the correct model name when loading.
- Tokenization Issues: Double-check that you are using the appropriate tokenizer that corresponds to the model.
- Insufficient Memory: If the model is too large to load, consider using the Base model instead of the Large.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
BERTimbau Base is a critical tool for anyone looking to leverage advanced NLP for Brazilian Portuguese. With its exceptional capabilities, you have a reliable ally in the world of language processing.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

