How to Use the Racism Detection Model for Spanish Text

May 6, 2022 | Educational

In the ever-evolving world of artificial intelligence, being able to identify and categorize sensitive content can significantly influence societal conversations. The Spanish Racism Detection Model is designed to recognize potentially racist statements through a robust regression approach. In this blog, we’ll walk you through how to utilize this powerful tool efficiently and offer some troubleshooting tips along the way.

Understanding the Model

This model is a fine-tuned version of BETO (Spanish BERT), trained on the Datathon Against Racism dataset. The processes involved in its development are documented in the upcoming paper titled “Estimating Ground Truth in a Low-labelled Data Regime: A Study of Racism Detection in Spanish”. Here’s how it works:

Imagine you’re a librarian sorting through thousands of books. Each book’s content varies—from poetry to political treatises. If you’re tasked with identifying all books related to controversial topics like racism, you’d need a system to help sift through the clutter. Similarly, this model takes various texts and categorizes them as either ‘racist’ or ‘non-racist’ based on the input it receives.

Getting Started with the Model

To begin using the model, follow these steps:

  1. Import necessary libraries like transformers.
  2. Set up the regression pipeline.
  3. Input your text data for evaluation.

Implementation Steps

Here’s how you can implement it:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
from transformers.pipelines import TextClassificationPipeline

class TextRegressionPipeline(TextClassificationPipeline):
    def __init__(self, **kwargs):
        self.regression_threshold = kwargs.pop('regression_threshold', None)
        super().__init__(**kwargs)

    def __call__(self, *args, **kwargs):
        self.regression_threshold_call = kwargs.pop('regression_threshold', None)
        result = super().__call__(*args, **kwargs)
        return result

    def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
        outputs = model_outputs['logits'][0].numpy()
        score = outputs[0]
        regression_threshold = self.regression_threshold or self.regression_threshold_call
        if regression_threshold:
            return {'label': 'racist' if score > regression_threshold else 'non-racist', 'score': score}
        else:
            return {'score': score}

model_name = 'regression-w-m-vote-epoch-4'
tokenizer = AutoTokenizer.from_pretrained('dccuchile/bert-base-spanish-wwm-uncased')
full_model_path = f'MartinoMensioracism-models/{model_name}'
model = AutoModelForSequenceClassification.from_pretrained(full_model_path)
pipe = TextRegressionPipeline(model=model, tokenizer=tokenizer)

texts = [
    'y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!',
    'Es que los judíos controlan el mundo'
]

# Get the score of regression
print(pipe(texts))
# Specify a threshold to cut between racist and non-racist
print(pipe(texts, regression_threshold=0.9))

Interpreting Output

The output reveals whether each text is potentially racist together with a score indicating its likelihood. This model can help initiate necessary conversations and guide interventions within communities.

Troubleshooting Tips

If you encounter issues while running the model, here are some common troubleshooting suggestions:

  • Check Your Environment: Ensure that you have the appropriate libraries installed, particularly the transformers library.
  • Model Loading Errors: If the model fails to load, verify that your full_model_path is correct and that the model has been downloaded properly.
  • Threshold Configuration: If the threshold doesn’t seem to work, double-check your implementation of the regression threshold in both initialization and call methods.
  • Text Format Issues: Ensure the input texts are properly formatted. Quotes or special characters may cause parsing errors.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By following this guide, you should be well-prepared to utilize the Spanish Racism Detection Model effectively. The journey of understanding and combatting racism through technology is essential, and every effort counts in this battle. Remember, at fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox