How to Classify Text Using AutoTrain Models in Python

Aug 26, 2023 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_23_3471

In this article, we will explore how to utilize the powerful AutoTrain models for text classification in Brazilian Portuguese. We will specifically focus on a model that can distinguish between toxic and non-toxic tweets. Whether you are a novice data scientist or an experienced developer, this guide will lead you through the process with clarity.

Getting Started

Before diving into the code, ensure you have the following prerequisites:

Python installed (preferably version 3.6 or higher)
The Transformers library installed
An API key from Hugging Face for accessing the model

Understanding the Model

The model we will be using is based on BERT (Bidirectional Encoder Representations from Transformers). It comes pre-trained and fine-tuned for our specific task, which is binary classification. Imagine this like having a skilled librarian who not only knows where every book is but can also categorize them as ‘fiction’ or ‘non-fiction’ with great accuracy!

Here’s a quick overview of the model specifications:

Model ID: 2489276793
Base Model: bert-base-multilingual-cased
Parameters: 109 Million
Model Size: 416MB
CO2 Emissions: 4.4298 grams

Validation Metrics

The model was tested and yielded the following results:

Loss: 0.432
Accuracy: 80% (verified)
Precision: 0.823
Recall: 0.704
AUC: 0.891
F1 Score: 0.759

Getting the Model to Work

Now, let’s see how to use this model in practice. You can choose between cURL or Python for this process.

Using cURL

Use the following command to make a POST request:

$ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoTrain"}' https://api-inference.huggingface.com/models/alexandreteles/autotrain-told_br_binary_sm-2489276793

Using Python

If you prefer using Python, here’s how to load the model and tokenizer:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained('alexandreteles/told_br_binary_sm')
tokenizer = AutoTokenizer.from_pretrained('alexandreteles/told_br_binary_sm')

inputs = tokenizer("I love AutoTrain", return_tensors='pt')
outputs = model(**inputs)

Troubleshooting Common Issues

If you encounter issues while working with the model, consider the following troubleshooting tips:

Ensure your API key is valid and has the necessary permissions.
Verify that you have installed the Transformers library correctly.
If Python throws an error, check that your code syntax matches the examples provided.
Make sure your internet connection is stable, as model loading requires external access.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With the insights and instructions provided above, you should now be able to successfully classify texts as toxic or non-toxic using AutoTrain models. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox