The fine-tuned version of the Spanish BERT model, also known as BETO, serves as a robust tool to detect racism in text through its training on the *Datathon Against Racism* dataset. This guide will help you navigate the process of using this model effectively.
Step-by-Step Guide to Implementation
- Step 1: Import Required Libraries
Before starting, you need to import the necessary libraries from the Hugging Face Transformers package:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
Here’s where the magic begins! We need to create a pipeline tailored for regression tasks. This will allow us to classify the input text based on its racist content.
class TextRegressionPipeline(TextClassificationPipeline):
def __init__(self, **kwargs):
self.regression_threshold = kwargs.pop('regression_threshold', None)
super().__init__(**kwargs)
def __call__(self, *args, **kwargs):
self.regression_threshold_call = kwargs.pop('regression_threshold', None)
result = super().__call__(*args, **kwargs)
return result
def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
outputs = model_outputs['logits'][0].numpy()
score = outputs[0]
regression_threshold = self.regression_threshold_call or self.regression_threshold
if regression_threshold:
return {'label': 'racist' if score > regression_threshold else 'non-racist', 'score': score}
else:
return {'score': score}
Now, you need to load the model and tokenizer, and initialize your text regression pipeline:
model_name = "regression-w-m-vote-epoch-2"
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-uncased")
model = AutoModelForSequenceClassification.from_pretrained(f"MartinoMen/Models/{model_name}")
pipe = TextRegressionPipeline(model=model, tokenizer=tokenizer)
Feed in your texts to the pipeline for classification:
texts = [
"y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!",
"Es que los judíos controlan el mundo"
]
# Just get the score of regression
print(pipe(texts)) # Outputs the scores
You can also specify a threshold to categorize the text as either ‘racist’ or ‘non-racist’:
print(pipe(texts, regression_threshold=0.9)) # Use a threshold for labels
Understanding the Code Through an Analogy
Imagine you are a chef preparing a special dish. The model is like your well-trained sous-chef who has been taught to distinguish between fresh and spoiled ingredients based on specific qualities. In this case, instead of food quality, the model assesses the “quality” of the text input by determining if it harbors racist sentiments or not.
Your pipeline is like your kitchen setup—a custom environment where you can efficiently prepare, mix, and serve your dish (in this case, the classification results). The regression pipeline serves as your guide for deciding when to call certain ingredients (texts) good or bad, much like you’d decide whether the content is ‘racist’ or ‘non-racist’ based on the specified threshold.
Troubleshooting Tips
If you encounter any issues during the implementation, consider the following troubleshooting ideas:
- Ensure all necessary libraries are installed and up-to-date, especially the Hugging Face Transformers library.
- Make sure you have the correct model and tokenizer loaded. Mismatched versions can commonly lead to errors.
- Double-check the syntax of your code, particularly where you’ve defined classes or functions.
- If an error message appears related to model loading, ensure the path to your model is accurate.
- Check whether the input texts are formatted correctly before sending them to the pipeline.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

