In this article, we will explore how to utilize the powerful AutoTrain models for text classification in Brazilian Portuguese. We will specifically focus on a model that can distinguish between toxic and non-toxic tweets. Whether you are a novice data scientist or an experienced developer, this guide will lead you through the process with clarity.
Getting Started
Before diving into the code, ensure you have the following prerequisites:
- Python installed (preferably version 3.6 or higher)
- The Transformers library installed
- An API key from Hugging Face for accessing the model
Understanding the Model
The model we will be using is based on BERT (Bidirectional Encoder Representations from Transformers). It comes pre-trained and fine-tuned for our specific task, which is binary classification. Imagine this like having a skilled librarian who not only knows where every book is but can also categorize them as ‘fiction’ or ‘non-fiction’ with great accuracy!
Here’s a quick overview of the model specifications:
- Model ID: 2489276793
- Base Model: bert-base-multilingual-cased
- Parameters: 109 Million
- Model Size: 416MB
- CO2 Emissions: 4.4298 grams
Validation Metrics
The model was tested and yielded the following results:
- Loss: 0.432
- Accuracy: 80% (verified)
- Precision: 0.823
- Recall: 0.704
- AUC: 0.891
- F1 Score: 0.759
Getting the Model to Work
Now, let’s see how to use this model in practice. You can choose between cURL or Python for this process.
Using cURL
Use the following command to make a POST request:
$ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoTrain"}' https://api-inference.huggingface.com/models/alexandreteles/autotrain-told_br_binary_sm-2489276793
Using Python
If you prefer using Python, here’s how to load the model and tokenizer:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained('alexandreteles/told_br_binary_sm')
tokenizer = AutoTokenizer.from_pretrained('alexandreteles/told_br_binary_sm')
inputs = tokenizer("I love AutoTrain", return_tensors='pt')
outputs = model(**inputs)
Troubleshooting Common Issues
If you encounter issues while working with the model, consider the following troubleshooting tips:
- Ensure your API key is valid and has the necessary permissions.
- Verify that you have installed the Transformers library correctly.
- If Python throws an error, check that your code syntax matches the examples provided.
- Make sure your internet connection is stable, as model loading requires external access.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
With the insights and instructions provided above, you should now be able to successfully classify texts as toxic or non-toxic using AutoTrain models. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

