In today’s digital age, leveraging artificial intelligence to combat social issues such as racism is not just an endeavor; it’s a necessity. This post will guide you through the process of using a fine-tuned Spanish BERT model that detects racist sentiments in text, specifically trained on the Datathon Against Racism dataset.
Getting Started
To make it easy for you, here’s a breakdown of the steps involved in using this model:
- Install necessary libraries.
- Load the model and tokenizer.
- Create a custom regression pipeline.
- Analyze sample text inputs for racist content.
1. Installing Required Libraries
Before starting, ensure you have Python and the Hugging Face Transformers library installed. You can install the libraries with the following command:
pip install transformers
2. Load the Model and Tokenizer
Once the libraries are installed, you will need to load your model and tokenizer. The model you will use is regression-w-m-vote-epoch-4. Here’s how to do it:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('dccuchile/bert-base-spanish-wwm-uncased')
model_name = 'MartinoMensioracism-models-regression-w-m-vote-epoch-4'
full_model_path = f'MartinoMensioracism-models-{model_name}'
model = AutoModelForSequenceClassification.from_pretrained(full_model_path)
3. Creating a Custom Regression Pipeline
To process the text efficiently, you’ll create a custom regression pipeline based on the TextClassificationPipeline from Transformers. This pipeline will allow you to specify a regression threshold while analyzing texts.
from transformers.pipelines import TextClassificationPipeline
class TextRegressionPipeline(TextClassificationPipeline):
def __init__(self, **kwargs):
self.regression_threshold = kwargs.pop('regression_threshold', None)
super().__init__(**kwargs)
def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
outputs = model_outputs['logits'][0].numpy()
score = outputs[0]
regression_threshold = self.regression_threshold
if regression_threshold:
return {'label': 'racist' if score > regression_threshold else 'non-racist', 'score': score}
else:
return {'score': score}
4. Analyzing Sample Text Inputs
Now that you have everything set up, you can analyze text inputs for racist sentiments. Here’s how you can do this:
pipe = TextRegressionPipeline(model=model, tokenizer=tokenizer)
texts = [
"y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!",
"Es que los judíos controlan el mundo"
]
# Get scores for the regression
print(pipe(texts)) # Just get the score of regression
# Specify a threshold to label as racist/non-racist
print(pipe(texts, regression_threshold=0.9))
Understanding the Model’s Output
When you run this pipeline, the model evaluates the text in the same way a discerning chef tastes ingredients before finalizing a dish. If the result is above the specified threshold, the output is labeled as racist; if it’s below, it’s marked as non-racist. This allows for a nuanced understanding of language, enabling the model to judge more delicately, much like how a chef decides whether to dial up the salt or keep it low.
Troubleshooting Tips
If you encounter any issues while implementing this model or do not see the expected results, here are some troubleshooting ideas:
- Ensure you’ve installed all relevant dependencies correctly.
- Check that the model and tokenizer load properly without errors.
- Review the texts you are analyzing. Ensure they are formatted correctly as strings.
- Make sure to set an appropriate regression threshold that aligns with your expectations.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

