How to Implement a Spanish Sentiment Analysis Classifier

May 28, 2024 | Educational

Sentiment analysis helps us gauge emotions from text data. For Spanish speakers, developing a classifier to interpret sentiments can enhance user experience in various applications. This blog provides a step-by-step guide to setting up a BERT-based Spanish sentiment analysis classifier, derived from a fascinating thesis project at the Universidad de Buenos Aires.

Overview

The project utilizes a model called dccuchilebert-base-spanish-wwm-uncased, specifically fine-tuned to detect sentiments in Spanish text. Trained on a dataset of 11,500 tweets, it efficiently identifies whether the sentiment is positive or negative.

Team Members

Model Details

  • Base Model: dccuchilebert-base-spanish-wwm-uncased
  • Hyperparameters:
    • dropout_rate = 0.1
    • num_classes = 2
    • max_length = 128
    • batch_size = 16
    • num_epochs = 5
    • learning_rate = 3e-5
  • Dataset: 11,500 Spanish tweets (Positive and Negative)

Performance Metrics

  • Accuracy: 86.47%
  • F1-Score: 86.47%
  • Precision: 86.46%
  • Recall: 86.51%

Usage

Installation

Begin by installing the required dependencies. Use the following command:

pip install transformers torch

Loading the Model

Next, load the model and tokenizer with the following code:

from transformers import BertForSequenceClassification, BertTokenizer

model = BertForSequenceClassification.from_pretrained('VerificadoProfesionalSaBERT-Spanish-Sentiment-Analysis')
tokenizer = BertTokenizer.from_pretrained('VerificadoProfesionalSaBERT-Spanish-Sentiment-Analysis')

Predict Function

Now, let’s define a function for making predictions. Think of this as a chef preparing a recipe. You take your ingredients (text) and process them to achieve the final dish (sentiment prediction).

def predict(model, tokenizer, text, threshold=0.5):
    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
    probabilities = torch.softmax(logits, dim=1).squeeze().tolist()
    predicted_class = torch.argmax(logits, dim=1).item()
    if probabilities[predicted_class] < threshold and predicted_class == 1:
        predicted_class = 0
    return bool(predicted_class), probabilities

Making Predictions

Finally, you can use the `predict` function to analyze a piece of Spanish text:

text = "Your Spanish news text here"
predicted_label, probabilities = predict(model, tokenizer, text)
print(f"Text: {text}")
print(f"Predicted Class: {predicted_label}")
print(f"Probabilities: {probabilities}")

License

This project adopts the Apache License 2.0 and also credits the TASS Dataset.

Acknowledgments

A big thank you to DCC UChile for providing the base Spanish BERT model and to all contributors to the dataset used for training.

Troubleshooting

If you encounter issues during installation or when your predictions aren’t performing as expected, here are some troubleshooting ideas:

  • Ensure your Python environment is set up correctly and that the required packages are installed without errors.
  • Double-check the text being inputted to the model; preprocessing may be necessary.
  • Adjust hyperparameters to see if different settings yield better results.
  • Refer to model documentation for additional information on nuances in usage.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox