How to Implement Transformer Question Generation on SQuAD

Jun 1, 2023 | Educational

Welcome to the fascinating world of Question Generation (QG) using transformers on the SQuAD dataset! In this guide, we will walk you through the necessary steps for recreating a question generation model based on the work by Ying-Hong Chan and Yao-Chung Fan, which leverages popular transformer models like BART, GPT2, and T5.

Understanding the Basics

Before we dive into the specifics, let’s understand what we’re working with. Imagine you’re hosting a trivia night with friends. You have information about various topics (like Harry Potter), and your friends need questions to answer. This is essentially what question generation is doing—it takes in a passage of text, such as a summary of a book, and produces relevant questions that one might ask about that text.

Key Concepts and Input Format

To begin the process, we’ll adhere to a specific input format:

C = [c1, c2, …, [HL], a1, …, aA, [HL], …, cC]

Here, c represents your context (the text), and a represents the answers that can be derived from that text. For instance:

Harry Potter is a series of seven fantasy novels written by British author, [HL] J. K. Rowling [HL]. # Who wrote Harry Potter?

Dataset Configuration

There are two primary dataset settings to consider:

SQuAD Dataset

Train: 87,599 examples
Validation: 10,570 examples
For more information, refer to SQuAD: 100,000+ Questions for Machine Comprehension of Text

SQuAD NQG Dataset

Train: 75,722 examples
Dev: 10,570 examples
Test: 11,877 examples
More details can be found in Learning to Ask: Neural Question Generation for Reading Comprehension

Available Models for Question Generation

There are several models that can be utilized for generating questions:

BART
GPT2
T5

Experimental Results

When you run experiments using the NQG Scorer for SQuAD NQG, you might reach different scores based on the model used. Below is a summary of expected performance:

Model                Bleu 1   Bleu 2   Bleu 3   Bleu 4   METEOR   ROUGE-L
BART-HLSQG          54.67    39.26    30.34    24.15    25.43    52.64
GPT2-HLSQG          49.31    33.95    25.41    19.69    22.29    48.82
T5-HLSQG            54.29    39.22    30.43    24.26    25.56    53.11

Troubleshooting

If you encounter any issues during implementation, consider the following solutions:

Ensure that your dataset paths are correctly set and accessible.
Check that you have installed all necessary libraries and packages as specified in the documentation.
Make sure your model’s parameters and configurations align with those mentioned in the provided examples.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox