Understanding Formality Prediction in English Sentences Using AI

Sep 11, 2023 | Educational

In the realm of natural language processing, predicting the formality of sentences can significantly enhance communication tools, improving the way we interact in various contexts. This article guides you through the intricacies of building a model to distinguish between formal and informal English sentences using a pre-trained model called roberta-base.

What is the Model?

This model utilizes the roberta-base architecture and has been trained on two prominent datasets: the GYAFC dataset from Rao and Tetreault (2018) and the online formality corpus from Pavlick and Tetreault (2016). The aim is to accurately predict whether a given English sentence is formal or informal.

Model Training and Data Augmentation

The model training involves manipulating the text data to avoid over-reliance on punctuation and capitalization. To achieve this, several data augmentation techniques were applied:

  • Changing text to upper or lower case
  • Removing all punctuation
  • Adding a period at the end of each sentence

These adjustments help the model focus on more substantial features beyond just punctuation and casing.

Model Performance

After training, the model’s performance was evaluated on the test dataset, and it produced impressive metrics:

Dataset ROC AUC Precision Recall F-score Accuracy Spearman
GYAFC 0.9779 0.90 0.91 0.90 0.9087 0.8233
GYAFC normalized (lowercase + remove punct.) 0.9234 0.85 0.81 0.82 0.8218 0.7294
P&T subset news: 0.4003, answers: 0.7500, blog: 0.7334, email: 0.7606

Understanding the Process: A Culinary Analogy

Imagine you’re a chef in a kitchen, where each recipe represents a dataset. The roberta-base is like your culinary technique. Depending on the ingredients you use (data), the way you mix (model parameters), and the cooking methods (data augmentation), the final dish (predicted formality) can vary in taste (accuracy).

The recipes (datasets) you choose, like the GYAFC and online formality corpus, determine what flavors (formality levels) can be achieved. By ensuring you balance flavors (features), you avoid overcooking one aspect (relying too much on punctuation and casing) and end up with a delicious outcome (a well-performing model).

Troubleshooting Tips

If you encounter issues while implementing the formality prediction model, consider the following tips:

  • Ensure that the datasets you’re using are correctly formatted and accessible.
  • Review your augmentation techniques; they should add diversity without altering the meaning of the text.
  • If model performance is lacking, consider adjusting parameters within the roberta-base model or examining the dataset for imbalances.
  • Look for updates to the datasets or the model that may improve performance.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox