How to Detect Sexism in Tweets using the EXIST Dataset

May 21, 2022 | Educational

Welcome to a guide on developing a model to detect sexism in tweets. We’ll dive into how to fine-tune a deceptively simple approach: leveraging the EXIST Dataset and the MeTwo dataset to create impactful tools that can help in the fight against online sexism. This process will involve training a model to analyze tweets in Spanish, identifying whether they contain sexist remarks or not.

Understanding the Model

At its core, this model uses a process similar to a detective following clues to solve a mystery; however, instead of physical evidence, you’ll be examining tweets for expressions of sexism. Imagine the model as your keen-eyed detective, scouring through social media for telling signs of machismo or discrimination hidden within the text.

  • The model has been fine-tuned from an existing hate-speech model.
  • It has been trained with a combined focus on the EXIST and MeTwo datasets, enhancing its capacity to recognize subtle sexist remarks.

Essential Tools and Setup

Before you can get started, ensure you have the following tools set up:

  • Python: The programming language used for model development.
  • Pip: To install required libraries.
  • Transformers library: For importing pre-trained models.

To install the required libraries, run:

pip install transformers

Training the Model

The training of the model can be thought of as preparing your detective for the job by providing specific training and context. In this case, we’ll adjust parameters and settings for optimal performance:

  • Learning Rate: Controls how swiftly the model learns (set to 5E-5).
  • Number of Epochs: The model will go through the dataset multiple times (set to 8).
  • Batch Size: Processing multiple samples at once (set to 32).

Training Procedure

You’ll employ the AdamW optimizer to adjust weights during training based on gradients earned from the data.

optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08

Evaluating Model Performance

The evaluation metrics provide a sense of how our detective is performing on the job:

  • Loss: Measures how well the model is performing—lower loss indicates better performance.
  • Accuracy: In our evaluation, the model achieved an impressive accuracy of 83%.

Using the Model

The following command allows you to check a tweet for sexism:

pipeline_nlp("mujer al volante peligro!")

The model will return a label (either NON SEXIST or SEXISM) along with a score indicating its confidence in the classification.

Troubleshooting Tips

Sometimes, things may not go as planned. Here are a few troubleshooting ideas:

  • Low Accuracy: Ensure that you’re using a rich dataset and that the training parameters (like learning rate and epochs) are appropriately set.
  • Installation Issues: Be certain that all dependencies are installed correctly, particularly the transformers library.
  • Model Not Responding: Check for coding errors and ensure that you’re using the latest versions of libraries like PyTorch.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Future Directions

Future improvements could include expanding the dataset through data collection techniques or applying data augmentation strategies to bolster the model further.

Conclusion

Leveraging models like the one we’ve discussed can pave the way to highlight important social issues such as sexism on digital platforms. The dedication of contributors like medardodt and MariaIsabel showcases the collaborative spirit needed to tackle these complex challenges.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox