How to Use Twitter-roBERTa-base for Sentiment Analysis

May 31, 2023 | Educational

With the rise of social media, analyzing public sentiment has never been more crucial. Why? Because sometimes we need to understand how people feel about pressing issues, like the current spike in Covid cases! This article will guide you through setting up and using the Twitter-roBERTa-base model for sentiment analysis.

What is Twitter-roBERTa-base?

The Twitter-roBERTa-base model is a refined RoBERTa model trained on approximately 124 million tweets from January 2018 to December 2021. It’s fine-tuned for sentiment analysis using the TweetEval benchmark. For anyone interested in analyzing tweets, this model is a powerful tool.

Getting Started

To kick off, you’ll need to ensure you have Python and the Hugging Face Transformers library installed. Here’s how to set it up:

pip install transformers

Example Pipeline

Now, let’s dive into the code. Picture yourself as a chef using a recipe to create a delicious dish. The ingredients are your code and the results will be your sentiment analysis. Here’s how you can do it:

from transformers import pipeline

model_path = "cardiffnlp/twitter-roberta-base-sentiment"
sentiment_task = pipeline("sentiment-analysis", model=model_path, tokenizer=model_path)

# Analyze a tweet
result = sentiment_task("Covid cases are increasing fast!")
print(result)

When you run this code, you can expect an output similar to:

[{'label': 'Negative', 'score': 0.7236}]

Full Classification Example

Here’s a more comprehensive example that processes user mentions and URLs. Consider it as preparing a gourmet dish that requires handling extra ingredients smoothly.

from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification, AutoTokenizer, AutoConfig
import numpy as np
from scipy.special import softmax

def preprocess(text):
    new_text = []
    for t in text.split():
        t = "@user" if t.startswith("@") and len(t) > 1 else t
        t = "http" if t.startswith("http") else t
        new_text.append(t)
    return ' '.join(new_text)

MODEL = "cardiffnlp/twitter-roberta-base-sentiment-latest"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

text = "Covid cases are increasing fast!"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors="pt")
output = model(**encoded_input)
scores = output[0].detach().numpy()
scores = softmax(scores)

# Print labels and scores
ranking = np.argsort(scores)
ranking = ranking[::-1
for i in range(scores.shape[0]):
    l = config.id2label[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l}: {np.round(float(s), 4)}")

This will output the sentiment scores, allowing you to categorize public sentiment effectively.

Troubleshooting Ideas

  • If you encounter an error stating that a model cannot be found, ensure you have the correct model path, as well as internet connectivity your first time running the code.
  • Should the output not meet expectations, verify that your input text is well-structured and doesn’t contain any strange characters.
  • For library-related issues, make sure the version of Transformers you’re using is up to date by running pip install --upgrade transformers.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

In a world where sentiment can sway opinions and influence decisions, tools like the Twitter-roBERTa-base model are invaluable. The journey of sentiment analysis is just a few lines of code away!

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox