Twitter Sentiment Analysis – Classical Approach VS Deep Learning

Nov 14, 2020 | Data Science


Photo by Gaelle Marcela on Unsplash

Overview

This project aims to explore the world of Natural Language Processing (NLP) by building what is known as a Sentiment Analysis Model. This model analyses a given piece of text, predicting whether it expresses positive or negative sentiment.



To achieve this, we will leverage the sentiment140 dataset, a perfectly balanced dataset containing tweets. The creators of this dataset innovatively automated the annotation of tweets, associating emoticons like šŸ™‚ with positive sentiments and šŸ™ with negative sentiments.

After a series of cleaning and data processing, and visualizing our data in a word cloud, we will build a Naive Bayesian model to classify positive and negative tweets. Following that, we’ll delve into a more advanced deep learning model using LSTM, which will involve different data cleaning and processing techniques, and we will explore concepts including Word Embeddings, Dropout, and more.

Table of Content:

  1. Importing and Discovering the Dataset
  2. Cleaning and Processing the Data
    • Tokenization
    • Lemmatization
    • Cleaning the Data
  3. Visualizing the Data
  4. Naive Bayesian Model
    • Splitting the Data
    • Training the Model
    • Testing the Model
    • Asserting the Model
  5. Deep Learning Model – LSTM
    • Data Pre-processing
      • Word Embeddings
      • Global Vectors for Word Representation (GloVe)
      • Data Padding
    • Data Transformation
    • Building the Model
    • Training the Model
    • Investigating Possibilities to Improve the Model
      • Regularization – Dropout
      • Inspecting the Data – Unknown Words
    • Predicting on Custom Data
    • Inspecting Wrongly Predicted Data
  6. Bonus Section
  7. Extra Tip: Pickling
  8. Further Work

How to Build a Sentiment Analysis Model

To build your sentiment analysis model, follow these steps:

1. Importing and Discovering the Dataset

Start by importing your libraries (like pandas and numpy) and load the sentiment140 dataset. This step is crucial, as it sets the stage for the whole analysis.

2. Cleaning and Processing the Data

This step is akin to preparing ingredients before cooking a meal—the cleaner the ingredients (or data), the better the end product. Here’s how to approach it:

  • Tokenization: Split your text into individual words.
  • Lemmatization: Reduce words to their base form (making ā€œrunningā€ into ā€œrunā€).
  • Cleaning the Data: Remove any unnecessary symbols or characters to ensure your data is uniform.

3. Visualizing the Data

Create visual representations, like word clouds, to understand your dataset better. This is like looking at the menu before ordering at a restaurant—it helps visualize your options!

4. Naive Bayesian Model

This stage involves building your first model:

  • Splitting the Data: Divide your dataset into training and testing sets to validate your model’s performance.
  • Training the Model: Use the training data to teach your model how to classify.
  • Testing the Model: Evaluate how well your model classifies sentiment based on the test data.
  • Asserting the Model: Confirm that your model’s predictions match your expectations.

5. Deep Learning Model – LSTM

Deep learning brings a new level of sophistication. Imagine your model is now a professional chef mastering a complex recipe:

  • Data Pre-processing: Repeat cleaning steps as needed and ensure your data is ready for advanced modeling.
  • Building the Model: Define your LSTM architecture.
  • Training the Model: Feed your LSTM with the prepared data.
  • Investigating Improvements: Look into ways to optimize such as Dropout, which helps avoid overfitting your model.

Troubleshooting

While building your model, you may encounter some issues. Here are some troubleshooting tips:

  • If your model’s accuracy is low, check if your data is well-balanced and clean.
  • Examine your hyperparameters if the model performs inconsistently.
  • Look out for overfitting—if your training accuracy is high, but testing accuracy is low, consider techniques like regularization or Dropout.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

In this blog, we explored the fascinating journey of building a Sentiment Analysis Model. From classical approaches using Naive Bayesian models to the complexities of LSTM in deep learning, each step deepens our understanding of NLP.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Continue reading the whole notebook here. You can also find this notebook, and give it an upvote, on Kaggle.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox