Unlocking the Predictive Power of Sentiment Analysis with CrudeBERT

Oct 28, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesCaptain-1337_CrudeBERT

In the dynamic world of oil prices, understanding the nuances of market sentiment can provide a powerful edge. This blog serves as a user-friendly guide on how to utilize CrudeBERT, a deep learning-based sentiment analysis model, to predict the price movements of WTI crude oil based on news headlines.

What is CrudeBERT?

CrudeBERT is a pre-trained natural language processing (NLP) model fine-tuned for detecting the sentiment of news headlines related to crude oil. Originating from a master’s thesis, it enhances traditional financial sentiment analysis tools by introducing domain-specific adaptations that recognize the unique factors influencing crude oil prices.

Why Use Sentiment Analysis for Oil Prices?

Price Movements: Oil prices are heavily influenced by supply and demand changes, often conveyed through news headlines.
Domain Adaptation Importance: General financial sentiment models might miss critical insights relevant to crude oil due to the lack of specificity in their training data.
Rich Content: News headlines offer condensed and impactful overviews that can help build robust predictive models.

Guide to Using CrudeBERT

To get started with CrudeBERT, follow these simple steps:

Step 1: Download Necessary Files

First things first, you need to download two essential files from Hugging Face:

crude_bert_config.json
crude_bert_model.bin

Step 2: Set Up Your Jupyter Notebook

Create a Jupyter Notebook in the same folder where you downloaded the files and include the following code:

import torch
from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer
import numpy as np
import pandas as pd

# List of example headlines
headlines = [
    "Major Explosion, Fire at Oil Refinery in Southeast Philadelphia",
    "PETROLEOS confirms Gulf of Mexico oil platform accident",
    "CASUALTIES FEARED AT OIL ACCIDENT NEAR IRANS BORDER",
    "EIA Chief expects Global Oil Demand Growth 1 M BD to 2011",
    "Turkey Jan-Oct Crude Imports +98.5% To 57.9M MT",
    "China’s crude oil imports up 78.30% in February 2019",
    "Russia Energy Agency: Sees Oil Output put Flat In 2005",
    "Malaysia Oil Production Steady This Year At 700,000 BD",
    "ExxonMobil:Nigerian Oil Output Unaffected By Union Threat",
    "Yukos July Oil Output Flat On Mo, 1.73M BD - Prime-Tass",
    "2nd UPDATE: Mexico’s Oil Output Unaffected By Hurricane",
    "UPDATE: Ecuador July Oil Exports Flat On Mo At 337,000 BD",
    "China February Crude Imports -16.0% On Year",
    "Turkey May Crude Imports down 11.0% On Year",
    "Japan June Crude Oil Imports decrease 10.9% On Yr",
    "Iran’s Feb Oil Exports +20.9% On Mo at 1.56M BD - Official",
    "Apache announces large petroleum discovery in Philadelphia",
    "Turkey finds oil near Syria, Iraq border"
]
example_headlines = pd.DataFrame(headlines, columns=['Headline'])
config_path = './crude_bert_config.json'
model_path = './crude_bert_model.bin'

# Load the configuration
config = AutoConfig.from_pretrained(config_path)

# Create the model from the configuration
model = AutoModelForSequenceClassification.from_config(config)

# Load the model's state dictionary
state_dict = torch.load(model_path)

# Inspect keys and adjust if necessary
state_dict.pop('bert.embeddings.position_ids', None)

# Load the adjusted state dictionary into the model
model.load_state_dict(state_dict, strict=False)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Define the prediction function
def predict_to_df(texts, model, tokenizer):
    model.eval()
    data = []
    for text in texts:
        inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=64)
        with torch.no_grad():
            outputs = model(**inputs)
            logits = outputs.logits
            softmax_scores = torch.nn.functional.softmax(logits, dim=-1)
            pred_label_id = torch.argmax(softmax_scores, dim=-1).item()
            class_names = ['positive', 'negative', 'neutral']
            predicted_label = class_names[pred_label_id]
            data.append([text, predicted_label])
    df = pd.DataFrame(data, columns=['Headline', 'Classification'])
    return df

# Create DataFrame
example_headlines = pd.DataFrame(headlines, columns=['Headline'])

# Apply classification
result_df = predict_to_df(example_headlines['Headline'].tolist(), model, tokenizer)
result_df

Step 3: Execute the Notebook

Run the cells in your Jupyter Notebook. This will classify the provided headlines into positive, negative, or neutral sentiments based on the model’s predictions.

Troubleshooting Tips

If you encounter any issues while following the steps, try these troubleshooting suggestions:

Ensure that all file paths are correct and that the necessary files are in the same directory as your Jupyter Notebook.
Check for any missing libraries and install them using pip (e.g., pip install transformers torch pandas).
If you receive errors related to model loading, double-check that the state dictionary header is correctly aligned with your model’s structure.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

CrudeBERT presents a promising advancement in understanding crude oil price movements through sentiment analysis. By further refining and applying this technology, we can better navigate the complexities of oil market dynamics.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox