How to Use Gupshup for Summarizing Open-Domain Code-Switched Conversations

Sep 11, 2024 | Educational

Welcome to the world of Gupshup! If you’re interested in summarizing conversations that switching between languages (like Hinglish and English), you’ve stumbled upon the right tool. This blog will guide you through the setup process, code execution, and troubleshooting tips.

Getting Started: Downloading the Dataset

Before we jump into using Gupshup, you need access to the dataset. The Gupshup dataset is designed for two main tasks: summarizing Hinglish dialogues into English (h2e) and summarizing English dialogues into English (e2e). Here’s how to get it:

Request the Gupshup data using the following Google form: Google Form for Gupshup Data.
Once received, you will find files with the extensions .source and .target for each task.

Remember, the .source files contain the dialogues, while the .target files contain the summaries!

Setting Up Your Environment

To start using Gupshup, you need to set up your Python environment:

Clone the Gupshup repository:

git clone https://github.com/midas-research/gupshup.git

Create a Python virtual environment. Refer to Python venv documentation for guidance.
Install the required packages:

pip install -r requirements.txt

Running the Evaluation Script

The heart of Gupshup lies in its run_eval.py script. This script allows you to generate summaries from the prepared dialogues. Here’s how to use it:

Basic command structure:

python run_eval.py --model_name [Huggingface model name] --input_path [path to source file] --save_path [path to save summaries] --reference_path [path to target file] --score_path [path to save scores] --bs [batch size]

Examples:

To generate English summaries from Hinglish dialogues using the mBART model:

python run_eval.py     --model_name midasgupshup_h2e_mbart     --input_path data/h2e/test.source     --save_path generated_summary.txt     --reference_path data/h2e/test.target     --score_path scores.txt     --bs 8

To generate English summaries from English dialogues using the Pegasus model:

python run_eval.py     --model_name midasgupshup_e2e_pegasus     --input_path data/e2e/test.source     --save_path generated_summary.txt     --reference_path data/e2e/test.target     --score_path scores.txt     --bs 8

Understanding the Code: An Analogy

Think of Gupshup and its run_eval.py script as a chef in a restaurant. The chef is responsible for taking multiple ingredients (your dialogues) and crafting a delightful dish (summaries). Each ingredient represents different dialogue inputs. Just as a chef knows which seasoning enhances the flavor, the right model selects the optimal way to summarize the dialogue effectively.

As you input the data into the script, it’s like sending these ingredients to the chef. The chef, with the specified model (like mBART or Pegasus), processes these ingredients to create the best possible dish. Finally, the summaries generated will spoil you for choices, just like a well-curated menu!

Troubleshooting Tips

If you encounter any issues during the process, here are a few troubleshooting steps to consider:

Ensure you have the correct paths specified for your source and target files.
If the script fails to run, double-check your virtual environment and package installation.
Not seeing expected output? Verify that you have the right model and parameters set.
Feel free to create an issue on the GitHub repository if problems persist.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Words

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

With this guide, you’re ready to navigate the world of Gupshup and start summarizing code-switched conversations like a pro!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox