Welcome to the fascinating world of Question Generation (QG) using transformers on the SQuAD dataset! In this guide, we will walk you through the necessary steps for recreating a question generation model based on the work by Ying-Hong Chan and Yao-Chung Fan, which leverages popular transformer models like BART, GPT2, and T5.
Understanding the Basics
Before we dive into the specifics, let’s understand what we’re working with. Imagine you’re hosting a trivia night with friends. You have information about various topics (like Harry Potter), and your friends need questions to answer. This is essentially what question generation is doing—it takes in a passage of text, such as a summary of a book, and produces relevant questions that one might ask about that text.
Key Concepts and Input Format
To begin the process, we’ll adhere to a specific input format:
- C = [c1, c2, …, [HL], a1, …, aA, [HL], …, cC]
Here, c represents your context (the text), and a represents the answers that can be derived from that text. For instance:
Harry Potter is a series of seven fantasy novels written by British author, [HL] J. K. Rowling [HL]. # Who wrote Harry Potter?
Dataset Configuration
There are two primary dataset settings to consider:
SQuAD Dataset
- Train: 87,599 examples
- Validation: 10,570 examples
- For more information, refer to SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD NQG Dataset
- Train: 75,722 examples
- Dev: 10,570 examples
- Test: 11,877 examples
- More details can be found in Learning to Ask: Neural Question Generation for Reading Comprehension
Available Models for Question Generation
There are several models that can be utilized for generating questions:
- BART
- GPT2
- T5
Experimental Results
When you run experiments using the NQG Scorer for SQuAD NQG, you might reach different scores based on the model used. Below is a summary of expected performance:
Model Bleu 1 Bleu 2 Bleu 3 Bleu 4 METEOR ROUGE-L
BART-HLSQG 54.67 39.26 30.34 24.15 25.43 52.64
GPT2-HLSQG 49.31 33.95 25.41 19.69 22.29 48.82
T5-HLSQG 54.29 39.22 30.43 24.26 25.56 53.11
Troubleshooting
If you encounter any issues during implementation, consider the following solutions:
- Ensure that your dataset paths are correctly set and accessible.
- Check that you have installed all necessary libraries and packages as specified in the documentation.
- Make sure your model’s parameters and configurations align with those mentioned in the provided examples.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

