How to Summarize Open-Domain Code-Switched Conversations with Gupshup

Sep 10, 2024 | Educational

Welcome to our guide on utilizing the Gupshup framework for summarizing open-domain code-switched conversations, particularly useful for handling Hinglish dialogues. This easy-to-follow article will walk you through the steps to set up your environment, understand the required datasets, and model selections, and highlight potential troubleshooting tips to keep you on track.

Dataset Acquisition

To get started with Gupshup, you will need access to the dataset designed specifically for Hinglish dialogues to English summarization (h2e) and English dialogues to English summarization (e2e).

  • Request the Gupshup data using this Google form.
  • The dialogues have a .source extension (train.source) and the summaries have a .target extension (train.target).
  • Make sure to provide the .source file to the input_path and the .target file to the reference_path argument in the scripts.

Model Selection

Various models are available on the Hugging Face model hub for summarizing the dialogues. You can either download these weights locally and provide the path to the model_name argument or use a direct alias, which will allow for automatic download.

1. Hinglish Dialogues to English Summary (h2e)

2. English Dialogues to English Summary (e2e)

Inference: Step-by-Step Guide

To get your summarization running, follow these simple steps:

  1. Clone the Gupshup repository and create a Python virtual environment. For guidance on setting up a virtual environment, refer to the official documentation here.
  2. Navigate to the cloned repository and install the required packages:
  3. git clone https://github.com/midas-research/gupshup.git
    pip install -r requirements.txt
  4. Run the evaluation script with the specified arguments. Here’s an analogy to help you understand the process:
  5. Think of the evaluation script as a personal chef in a restaurant who needs specific ingredients and instructions to prepare a particular dish (the summary). The chef (script) requires the following orders:

    • model_name: Type of dish (model) – e.g., mBART or PEGASUS.
    • input_path: The pantry where the chef finds the main ingredients (source file).
    • save_path: The container where the finished dish goes (location to save summaries).
    • reference_path: A recipe book for comparison (references summary file).
    • score_path: Feedback form from the customer (location to save scores).
    • bs: Number of servings (batch size).
    • device: Kitchen appliances available to help cook (CUDA devices).

    For example, to summarize English dialogues from Hinglish using the MBART model, run:

    python run_eval.py
      --model_name midasgupshup_h2e_mbart
      --input_path datah2e/test.source
      --save_path generated_summary.txt
      --reference_path datah2e/test.target
      --score_path scores.txt
      --bs 8

    Or for English dialogues using the PEGASUS model:

    python run_eval.py
      --model_name midasgupshup_e2e_pegasus
      --input_path datae2e/test.source
      --save_path generated_summary.txt
      --reference_path datae2e/test.target
      --score_path scores.txt
      --bs 8

Troubleshooting

If you encounter any issues replicating the results or have questions along the way, consider the following troubleshooting tips:

  • Ensure you’ve downloaded the Gupshup dataset correctly and the file paths are accurate.
  • Check that the model weights are correctly referenced and downloaded either locally or via aliases.
  • Clear any caches or temporary files from previous attempts that may affect rerunning scripts.
  • Create an issue on GitHub if you’re still having difficulties.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Now you’re all set to explore the world of summarizing code-switched conversations with Gupshup! Happy coding!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox