How to Utilize a Fine-Tuned Spanish BERT Model for Racism Detection

May 6, 2022 | Educational

In today’s digital age, leveraging artificial intelligence to combat social issues such as racism is not just an endeavor; it’s a necessity. This post will guide you through the process of using a fine-tuned Spanish BERT model that detects racist sentiments in text, specifically trained on the Datathon Against Racism dataset.

Getting Started

To make it easy for you, here’s a breakdown of the steps involved in using this model:

Install necessary libraries.
Load the model and tokenizer.
Create a custom regression pipeline.
Analyze sample text inputs for racist content.

1. Installing Required Libraries

Before starting, ensure you have Python and the Hugging Face Transformers library installed. You can install the libraries with the following command:

pip install transformers

2. Load the Model and Tokenizer

Once the libraries are installed, you will need to load your model and tokenizer. The model you will use is regression-w-m-vote-epoch-4. Here’s how to do it:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('dccuchile/bert-base-spanish-wwm-uncased')
model_name = 'MartinoMensioracism-models-regression-w-m-vote-epoch-4'
full_model_path = f'MartinoMensioracism-models-{model_name}'
model = AutoModelForSequenceClassification.from_pretrained(full_model_path)

3. Creating a Custom Regression Pipeline

To process the text efficiently, you’ll create a custom regression pipeline based on the TextClassificationPipeline from Transformers. This pipeline will allow you to specify a regression threshold while analyzing texts.

from transformers.pipelines import TextClassificationPipeline

class TextRegressionPipeline(TextClassificationPipeline):
    def __init__(self, **kwargs):
        self.regression_threshold = kwargs.pop('regression_threshold', None)
        super().__init__(**kwargs)

    def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
        outputs = model_outputs['logits'][0].numpy()
        score = outputs[0]
        regression_threshold = self.regression_threshold

        if regression_threshold:
            return {'label': 'racist' if score > regression_threshold else 'non-racist', 'score': score}
        else:
            return {'score': score}

4. Analyzing Sample Text Inputs

Now that you have everything set up, you can analyze text inputs for racist sentiments. Here’s how you can do this:

pipe = TextRegressionPipeline(model=model, tokenizer=tokenizer)

texts = [
    "y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!",
    "Es que los judíos controlan el mundo"
]

# Get scores for the regression
print(pipe(texts))  # Just get the score of regression

# Specify a threshold to label as racist/non-racist
print(pipe(texts, regression_threshold=0.9))

Understanding the Model’s Output

When you run this pipeline, the model evaluates the text in the same way a discerning chef tastes ingredients before finalizing a dish. If the result is above the specified threshold, the output is labeled as racist; if it’s below, it’s marked as non-racist. This allows for a nuanced understanding of language, enabling the model to judge more delicately, much like how a chef decides whether to dial up the salt or keep it low.

Troubleshooting Tips

If you encounter any issues while implementing this model or do not see the expected results, here are some troubleshooting ideas:

Ensure you’ve installed all relevant dependencies correctly.
Check that the model and tokenizer load properly without errors.
Review the texts you are analyzing. Ensure they are formatted correctly as strings.
Make sure to set an appropriate regression threshold that aligns with your expectations.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox