In this article, we’ll guide you through the process of using state-of-the-art multilingual transformer models designed for identifying trolling, aggression, and cyberbullying. These models were developed and fine-tuned during the TRAC 2020 workshop, and they can significantly improve your text classification tasks, particularly in the realm of social media.
Getting Started with the Models
The models provided can be implemented using minimal code. Before you begin, ensure you have the necessary libraries installed, including transformers and torch. If you haven’t installed them yet, you can do so via pip:
pip install transformers torch
Step-by-Step Usage Instructions
Now, let’s break down the usage of the models into manageable steps:
- Import Required Libraries: Begin by importing the necessary modules from the libraries.
- Load Your Model: Choose between using the model from the databank or the Hugging Face repository.
- Tokenize Your Input: Prepare your text by processing it with the tokenizer.
- Run Inference: Get predictions and analyze the probabilities of the categories identified.
Understanding the Code with an Analogy
Let’s take a moment to analogize the code to help you understand its functionality better. Think of loading the model like preparing a special recipe: you first gather all necessary ingredients (libraries), then select whether you’re going for a classic family recipe (the databank model) or a trendy new interpretation (Hugging Face). The tokenizer acts like a chef’s knife, chopping the text into manageable slices before sending it off to the oven (model) to be baked into predictions.
Implementing the Code
Here’s a piece of code to implement the model:
from transformers import AutoModel, AutoTokenizer, AutoModelForSequenceClassification
import torch
from pathlib import Path
from scipy.special import softmax
import numpy as np
TASK_LABEL_IDS = {
'Sub-task A': ['OAG', 'NAG', 'CAG'],
'Sub-task B': ['GEN', 'NGEN'],
'Sub-task C': ['OAG-GEN', 'OAG-NGEN', 'NAG-GEN', 'NAG-NGEN', 'CAG-GEN', 'CAG-NGEN']
}
model_version = databank # or "hugging face"
if model_version == databank:
model_path = next(Path(databank_model).glob('*.output/*model'))
lang, task, _, base_model, _ = model_path.parts
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
else:
lang, task, base_model = ALL, 'Sub-task C', 'bert-base-multilingual-uncased'
base_model = "fsocialmediaie/TRAC2020_lang_lang".split()[-1] + "_base_model"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForSequenceClassification.from_pretrained(base_model)
model.eval()
sentence = "This is a good cat and this is a bad dog."
processed_sentence = f"[CLS] {sentence}"
tokens = tokenizer.tokenize(processed_sentence)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokens)
tokens_tensor = torch.tensor([indexed_tokens])
with torch.no_grad():
logits, = model(tokens_tensor, labels=None)
preds_probs = softmax(logits.detach().cpu().numpy(), axis=1)
preds = np.argmax(preds_probs, axis=1)
preds_labels = np.array(TASK_LABEL_IDS[task])[preds]
print(dict(zip(task_labels, preds_probs[0])), preds_labels)
Troubleshooting Common Issues
Even the most seasoned developers face hiccups along the way! Here are common issues you might encounter and how to resolve them:
- Model Not Found: Double-check the model version you’re trying to load. Ensure you have the correct paths and internet connection if pulling from a repository.
- Out of Memory Error: If using large models, ensure your hardware meets the specified requirements or switch to a smaller model.
- Tokenization Errors: Make sure your input sentences are encoded properly and fall within the model’s maximum input length.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

