In recent times, increasing global emphasis has been placed on combating racism and promoting inclusivity. One innovative approach to addressing this issue is through AI-based models that can identify and classify racist sentiments within texts. This article serves as a guide on how to utilize a fine-tuned Spanish BERT model specifically designed for racism detection.
Getting Started with the Model
The model we will be using is a fine-tuned version of BETO (spanish bert), trained on the Datathon Against Racism dataset. This model has undergone rigorous training, with various methods applied across multiple epochs.
Model Architecture and Usage
- This model was constructed based on regression analysis to determine the likelihood of text being racist.
- Users can specify a regression threshold, thus defining how strict the model should be when classifying sentiments.
- By implementing this model, texts are processed and outputted with scores indicating their potential racist sentiments.
Step-by-Step Guide to Implement the Model
Follow these instructions to implement the model in your project:
python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
from transformers.pipelines import TextClassificationPipeline
class TextRegressionPipeline(TextClassificationPipeline):
def __init__(self, **kwargs):
self.regression_threshold = kwargs.pop('regression_threshold', None)
super().__init__(**kwargs)
def __call__(self, *args, **kwargs):
self.regression_threshold_call = kwargs.pop('regression_threshold', None)
result = super().__call__(*args, **kwargs)
return result
def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
outputs = model_outputs['logits'][0].numpy()
score = outputs[0]
regression_threshold = self.regression_threshold or self.regression_threshold_call
if regression_threshold:
return {
"label": "racist" if score > regression_threshold else "non-racist",
"score": score
}
else:
return {"score": score}
model_name = "regression-w-m-vote-epoch-1"
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-uncased")
full_model_path = f"MartinoMensio/racism-models/{model_name}"
model = AutoModelForSequenceClassification.from_pretrained(full_model_path)
pipe = TextRegressionPipeline(model=model, tokenizer=tokenizer)
texts = [
"y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!",
"Es que los judíos controlan el mundo"
]
print(pipe(texts)) # Output shows scores
print(pipe(texts, regression_threshold=0.9)) # Output with labels
Understanding the Code Through Analogy
Imagine our AI model as a diligent librarian trained to detect troublesome books that might promote unkind ideas. Just as a librarian uses a set of defined criteria to determine whether a book should be placed on the “problematic shelf,” this model processes texts based on the provided regression threshold. If a book (or text) scores higher than the threshold, it gets marked for review (labeled as “racist”), otherwise, it remains on the normal shelf (labeled as “non-racist”). This contextual processing allows us to better understand and address sensitive topics directly.
Troubleshooting
When using the model, you may encounter certain issues. Here are some troubleshooting tips:
- Ensure the required libraries, such as transformers, are properly installed in your Python environment.
- Check the model’s path if you encounter a loading error. Confirm that “MartinoMensio/racism-models” contains the model you want to use.
- Adjust the regression threshold appropriately. If the model is too strict or too lenient, you may need to fine-tune its sensitivity.
If you face further difficulties, consider reaching out for help! For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

