How to Use mBART ERRnews for Text Summarization

Dec 10, 2022 | Educational

Welcome to the world of Natural Language Processing! In this guide, we will walk you through the process of using the mBART ERRnews model, specifically designed to summarize Estonian news stories. This model has been trained on a unique dataset and can effectively condense lengthy texts into brief summaries. So let’s get started!

What is mBART ERRnews?

The mBART ERRnews model is a pretrained version of the mBART-large-cc25 that has been fine-tuned specifically on the ERRnews Estonian news story dataset. The principal objective of this model is to generate concise and relevant summaries from a given input text.

How to Set Up Your Environment

Before you start coding, you need to ensure that you have the required libraries installed. Follow these steps:

  • Ensure you have Python and PyTorch installed on your machine.
  • Install the Transformers library using the command: pip install transformers.

Using the mBART ERRnews Model

Now, let’s write some code to utilize the mBART ERRnews model for summarizing text. Below is a step-by-step explanation of how to do this:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the pre-trained tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("TalTechNLP/mBART-ERRnews")
model = AutoModelForSeq2SeqLM.from_pretrained("TalTechNLP/mBART-ERRnews")

# Input text you want to summarize
text = "Riigikogu rahanduskomisjon võttis esmaspäeval maha riigieelarvesse esitatud investeeringuettepanekutest siseministeeriumi investeeringud koolidele ja lasteaedadele..."

# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt", max_length=1024)

# Generate the summary
summary_ids = model.generate(inputs["input_ids"])
summary = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids]

print(summary)

Code Explanation with an Analogy

Imagine you own a large library filled with books (this represents your input text). You want to create a condensed version of your favorite book so you don’t have to read it all again, but still remember the key points.

  • The tokenizer acts like a librarian, organizing the book into chapters (splitting the text into tokens), making it easy to extract relevant information.
  • The model is like the book editor, who takes the organized chapters and summarises them into a brief synopsis.
  • The inputs contain the chapters, while the summary gives you the final edited version, highlighting all the essential elements from the original thick book.

Training Data

The mBART model has been trained on the ERRnews dataset, which consists of 10,420 Estonian news story transcripts and summaries. This rich dataset has equipped the model to summarize effectively.

Evaluation Results

The performance of the model is evaluated using the ROUGE metrics that measure the quality of summaries. Here are the evaluation results:

Dataset ROUGE-1 ROUGE-2 ROUGE-L ROUGE-L-SUM
ERRnews 19.2 6.7 16.1 17.4

Troubleshooting Tips

While working with such powerful models, you might encounter some challenges. Here are some troubleshooting ideas:

  • If you experience an issue where the model is not loading, ensure that your internet connection is stable as the model is fetched from the Hugging Face hub.
  • In case of memory errors, consider reducing the max_length parameter when tokenizing the input.
  • Should the summarization not meet your expectations, experiment with different texts or provide clearer input to yield better results.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Using the mBART ERRnews model can streamline your summarization tasks and help you extract essential information rapidly from news stories. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox