How to Summarize Open-Domain Code-Switched Conversations Using Gupshup

Sep 13, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_24_1000

Welcome to the future of summarizing dialogues with Gupshup! Developed for the EMNLP 2021 conference, Gupshup is a remarkable tool that allows you to summarize code-switched conversations effectively. In this guide, we’ll walk you through the steps to get started, as well as troubleshoot potential issues you might encounter along the way.

Getting Started with Gupshup

To get started with Gupshup, you’ll need to follow a few easy steps:

Request the Dataset: Start by requesting the Gupshup data via this Google form. The dataset includes Hinglish dialogues to English summarization and English dialogues to English summarization.
Clone the Repository: Clone the Gupshup repository using:
```
git clone https://github.com/midas-research/gupshup.git
```
Create a Python Virtual Environment: Set up a virtual environment to manage your dependencies. You can follow the guidelines here.
Install Required Packages: Navigate into the cloned directory and install the necessary packages by running:
```
pip install -r requirements.txt
```

Understanding the Model Selection

Gupshup provides a variety of models to choose from. Here’s an analogy that simplifies the selection process:

Think of the models as different types of chefs in a restaurant. Each chef specializes in creating unique dishes from a set of ingredients (data). You choose a chef based on the kind of meal you want:

Hinglish Dialogues to English Summary (h2e): – mBART: midasgupshup_h2e_mbart – PEGASUS: midasgupshup_h2e_pegasus
English Dialogues to English Summary (e2e): – mBART: midasgupshup_e2e_mbart – PEGASUS: midasgupshup_e2e_pegasus

Select your chef based on the type of dialogues you are working with!

Running Inference

Once you’ve set up the environment and selected your model, it’s time to run inference:

Run the evaluation script with the following command:

python run_eval.py --model_name midasgupshup_h2e_mbart --input_path data/h2e/test.source --save_path generated_summary.txt --reference_path data/h2e/test.target --score_path scores.txt --bs 8

For the English to English dialogue summarization, run:

python run_eval.py --model_name midasgupshup_e2e_pegasus --input_path data/e2e/test.source --save_path generated_summary.txt --reference_path data/e2e/test.target --score_path scores.txt --bs 8

Troubleshooting Common Issues

If you encounter any challenges while working with Gupshup, here’s what to consider:

Dataset Issues: Ensure you have downloaded the Gupshup dataset correctly and enclosed the correct paths in your script.
Dependency Errors: Make sure all required packages are installed. Rerun the installation command if needed.
Model Loading Issues: Confirm that you are referring to the correct Huggingface model aliases within your scripts.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

References

If you found the resources in this guide useful, please consider citing the paper: GupShup: Summarizing Open-Domain Code-Switched Conversations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox