How to Analyze Comment Authorship Patterns on Korean News Articles with a BERT Classifier

Apr 26, 2022 | Educational

In the age of digital communication, understanding the authorship of comments, especially on news articles, can reveal insights into public sentiment. Today, we’re diving into a binary classifier model developed specifically to discern the ideological tendencies (liberal vs. conservative) of user-generated comments on Korean news articles. By utilizing advanced AI and BERT (Bidirectional Encoder Representations from Transformers), this model offers an intricate look into how opinions are shaped in the digital landscape.

Overview of the BERT Model

This model is not just any classifier; it’s fine-tuned from the ETRI’s KorBERT and has been trained on approximately 37,000 user comments sourced from NAVER’s news portal. However, be cautious—while it operates efficiently, the dataset was collected in 2019, meaning that it may lack context for recent political discussions.

How to Use the BERT Classifier

To harness the power of this classifier, you will need to edit the BertTokenizer class, which you can find in the KorBertTokenizer.py file. Here’s a straightforward guide on how to implement the model in your own projects:

from KorBertTokenizer import KorBertTokenizer
from transformers import BertForSequenceClassification
import torch

tokenizer = KorBertTokenizer.from_pretrained('conviettekorPolBERT')
model = BertForSequenceClassification.from_pretrained('conviettekorPolBERT')

def classify(text):
    inputs = tokenizer(text, padding='max_length', max_length=70, return_tensors='pt')
    with torch.no_grad():
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        return model.config.id2label[predicted_class_id]

input_strings = [
    '좌파가 나라 경제 안보 말아먹는다',
    '수꼴들은 나라 일본한테 팔아먹었냐'
]

for input_string in input_strings:
    print('===\n입력 텍스트: {}\n분류 결과: {}\n==='.format(input_string, classify(input_string)))

The Analogy: A BERT Classifier as a Political Debate Moderator

Imagine a political debate where each comment represents a candidate’s argument. The BERT classifier works much like a seasoned moderator, listening to each statement (user comment) and categorizing them based on their underlying ideology. Just as a moderator would sift through the noise to highlight whether a candidate leans left or right, this model analyzes comments to determine if they reflect liberal or conservative views. It takes into account the language and context used, expertly discerning the underlying sentiment that may be less obvious to the casual observer.

Model Performance

When it comes to evaluating the efficacy of this classifier, here are some noteworthy stats:

Accuracy: 0.8322
F1-Score: 0.8322

For those eager for more technical details, check out our paper for the W-NUT workshop at EMNLP 2019 titled The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media.

Troubleshooting Guide

As with all models, you might encounter some hurdles while using the classifier. Here are a few troubleshooting tips to help you out:

Ensure you have installed the required packages, including transformers and torch.
Double-check the path to KorBertTokenizer.py to make sure it’s correctly linked in your project.
If the model is not producing expected results, consider reviewing your input strings for any formatting errors.
For better performance, use comments from the same period (2019) for optimal classification results.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations. This BERT-based classifier is just one of the many tools in our arsenal to analyze and understand the complex patterns embedded in societal discourse.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox