The digital age presents unique challenges, including various forms of online hostility. This blog will guide you through using multilingual transformer models specifically designed to detect trolling, aggression, and cyberbullying as outlined in the paper presented at TRAC 2020, authored by Sudhanshu Mishra, Shivangi Prasad, and Shubhanshu Mishra.
Understanding the Models
The joint fine-tuning approach leverages transformer models to analyze and classify unseen text inputs. This is akin to training a dog to recognize different situations based on behavioral cues. Imagine a dog that learns to differentiate between playful barks and aggressive growls; analogously, our model learns to distinguish between various aspects of language that signify trolling, aggression, or bullying. Each category helps the model understand the nuances of the text it encounters.
Getting Started
Here’s how you can utilize these models in your projects:
- First, ensure you have the necessary libraries installed:
transformerstorchscipynumpypandas
Usage Instructions
To use the models, follow the code snippet below:
python
from transformers import AutoModel, AutoTokenizer, AutoModelForSequenceClassification
import torch
from pathlib import Path
from scipy.special import softmax
import numpy as np
import pandas as pd
TASK_LABEL_IDS = {
"Sub-task A": ["OAG", "NAG", "CAG"],
"Sub-task B": ["GEN", "NGEN"],
"Sub-task C": ["OAG-GEN", "OAG-NGEN", "NAG-GEN", "NAG-NGEN", "CAG-GEN", "CAG-NGEN"]
}
model_version = "databank" # other option is hugging face library
if model_version == "databank":
# Ensure you have downloaded the required model file
# Unzip the file at some model_path
model_path = next(Path(databank_model).glob("*.output/model"))
lang, task, _, base_model, _ = model_path.parts
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
else:
lang, task, base_model = "ALL", "Sub-task C", "bert-base-multilingual-uncased"
base_model = "fsocialmediaieTRAC2020_lang_lang".split()[-1] + "_base_model"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForSequenceClassification.from_pretrained(base_model)
# Set model in eval mode for inference
model.eval()
# Example for further fine-tuning
sentence = "This is a good cat and this is a bad dog."
processed_sentence = f"{tokenizer.cls_token} {sentence}"
tokens = tokenizer.tokenize(processed_sentence)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokens)
tokens_tensor = torch.tensor([indexed_tokens])
with torch.no_grad():
logits, = model(tokens_tensor, labels=None)
preds = logits.detach().cpu().numpy()
preds_probs = softmax(preds, axis=1)
preds = np.argmax(preds_probs, axis=1)
preds_labels = np.array(TASK_LABEL_IDS[task])[preds]
print(dict(zip(TASK_LABEL_IDS[task], preds_probs[0])), preds_labels)
Expected Output
Running the code snippet should yield an output similar to this:
(CAG-GEN: 0.06762535, CAG-NGEN: 0.03244293, NAG-GEN: 0.6897794, NAG-NGEN: 0.15498641, OAG-GEN: 0.034373745, OAG-NGEN: 0.020792078, array([NAG-GEN], dtype=U8))
Troubleshooting Common Issues
- If you encounter errors related to missing libraries, ensure all mentioned libraries are installed. Use
pip installto install necessary packages. - In case of model retrieval errors, double-check the model path and ensure that the models have been downloaded and extracted correctly.
- For output-related discrepancies, note that these models are retrained for this upload, and evaluation metrics may vary slightly from those reported in the paper.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

