How to Use Multilingual Joint Fine-tuning of Transformer Models to Identify Trolling, Aggression, and Cyberbullying

Sep 12, 2024 | Educational

The digital age presents unique challenges, including various forms of online hostility. This blog will guide you through using multilingual transformer models specifically designed to detect trolling, aggression, and cyberbullying as outlined in the paper presented at TRAC 2020, authored by Sudhanshu Mishra, Shivangi Prasad, and Shubhanshu Mishra.

Understanding the Models

The joint fine-tuning approach leverages transformer models to analyze and classify unseen text inputs. This is akin to training a dog to recognize different situations based on behavioral cues. Imagine a dog that learns to differentiate between playful barks and aggressive growls; analogously, our model learns to distinguish between various aspects of language that signify trolling, aggression, or bullying. Each category helps the model understand the nuances of the text it encounters.

Getting Started

Here’s how you can utilize these models in your projects:

  • First, ensure you have the necessary libraries installed:
  • transformers
  • torch
  • scipy
  • numpy
  • pandas

Usage Instructions

To use the models, follow the code snippet below:

python
from transformers import AutoModel, AutoTokenizer, AutoModelForSequenceClassification
import torch
from pathlib import Path
from scipy.special import softmax
import numpy as np
import pandas as pd

TASK_LABEL_IDS = {
    "Sub-task A": ["OAG", "NAG", "CAG"],
    "Sub-task B": ["GEN", "NGEN"],
    "Sub-task C": ["OAG-GEN", "OAG-NGEN", "NAG-GEN", "NAG-NGEN", "CAG-GEN", "CAG-NGEN"]
}

model_version = "databank"  # other option is hugging face library

if model_version == "databank":
    # Ensure you have downloaded the required model file
    # Unzip the file at some model_path
    model_path = next(Path(databank_model).glob("*.output/model"))
    lang, task, _, base_model, _ = model_path.parts
    tokenizer = AutoTokenizer.from_pretrained(base_model)
    model = AutoModelForSequenceClassification.from_pretrained(model_path)
else:
    lang, task, base_model = "ALL", "Sub-task C", "bert-base-multilingual-uncased"
    base_model = "fsocialmediaieTRAC2020_lang_lang".split()[-1] + "_base_model"
    tokenizer = AutoTokenizer.from_pretrained(base_model)
    model = AutoModelForSequenceClassification.from_pretrained(base_model)

# Set model in eval mode for inference
model.eval()

# Example for further fine-tuning
sentence = "This is a good cat and this is a bad dog."
processed_sentence = f"{tokenizer.cls_token} {sentence}"
tokens = tokenizer.tokenize(processed_sentence)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokens)
tokens_tensor = torch.tensor([indexed_tokens])

with torch.no_grad():
    logits, = model(tokens_tensor, labels=None)

preds = logits.detach().cpu().numpy()
preds_probs = softmax(preds, axis=1)
preds = np.argmax(preds_probs, axis=1)
preds_labels = np.array(TASK_LABEL_IDS[task])[preds]
print(dict(zip(TASK_LABEL_IDS[task], preds_probs[0])), preds_labels)

Expected Output

Running the code snippet should yield an output similar to this:

(CAG-GEN: 0.06762535, CAG-NGEN: 0.03244293, NAG-GEN: 0.6897794, NAG-NGEN: 0.15498641, OAG-GEN: 0.034373745, OAG-NGEN: 0.020792078, array([NAG-GEN], dtype=U8))

Troubleshooting Common Issues

  • If you encounter errors related to missing libraries, ensure all mentioned libraries are installed. Use pip install to install necessary packages.
  • In case of model retrieval errors, double-check the model path and ensure that the models have been downloaded and extracted correctly.
  • For output-related discrepancies, note that these models are retrained for this upload, and evaluation metrics may vary slightly from those reported in the paper.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox