How to Use MultiIndicQuestionGenerationSS for Question Generation in Indian Languages

May 25, 2022 | Educational

Welcome to the realm of advanced NLP techniques! Today, we’re diving into how to leverage the MultiIndicQuestionGenerationSS model for generating questions in multiple Indian languages. This pre-trained model is not only efficient but also designed specifically for Indian languages, making it an invaluable tool for those interested in linguistic AI.

What is MultiIndicQuestionGenerationSS?

MultiIndicQuestionGenerationSS is a multilingual, sequence-to-sequence model fine-tuned on the sizable IndicQuestionGeneration dataset. It supports questioning in 11 Indian languages, allowing you to build sophisticated question generation applications. Think of this model as a multi-lingual chef, equipped with a recipe (the dataset) to perfectly create questions in different languages.

How to Set Up the Model

Setting up the model is a straightforward process. Below, we’ll walk you through the steps to get started:

  • Import the necessary libraries: You need to import specific classes from the `transformers` library.
  • Load the tokenizer: The tokenizer helps in preparing input data for the model.
  • Load the model: You can choose between the AutoModelForSeq2SeqLM and MBartForConditionalGeneration.
  • Tokenize inputs: Proper tokenization is vital for the model to understand the inputs.
  • Generate outputs: Finally, use the model to generate your questions.

Sample Code

The code snippet below provides a complete example of how to implement the model:

from transformers import MBartForConditionalGeneration, AutoModelForSeq2SeqLM
from transformers import AlbertTokenizer, AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/MultiIndicQuestionGenerationSS", do_lower_case=False, use_fast=False, keep_accents=True)

# Load model
model = AutoModelForSeq2SeqLM.from_pretrained("ai4bharat/MultiIndicQuestionGenerationSS")

# Tokenization
input_text = "7 फरवरी, 2016 [SEP] खेल 7 फरवरी, 2016 को कैलिफोर्निया में खेला गया। s 2hi" 
out_text = "2hi सुपर बाउल किस दिन खेला गया? s"

inp = tokenizer(input_text, add_special_tokens=False, return_tensors="pt", padding=True).input_ids
out = tokenizer(out_text, add_special_tokens=False, return_tensors="pt", padding=True).input_ids

# Running the model
model_output = model(input_ids=inp, decoder_input_ids=out[:, 0:-1], labels=out[:, 1:])
decoded_output = tokenizer.decode(model_output[0], skip_special_tokens=True)
print(decoded_output)  # Example output

Understanding the Code: An Analogy

Imagine you’re setting up a cooking class (the model). You start by ensuring you have the right utensils (import libraries) and ingredients (tokenizer and model). The recipe (code) guides you through each step, from prepping your ingredients (tokenization) to cooking (generating questions). The result? A delicious dish (questions) tailored to the tastes of your guests (users) based on the rich variety of Indian languages!

Troubleshooting

If you encounter issues while deploying the MultiIndicQuestionGenerationSS model, consider these troubleshooting steps:

  • Ensure all libraries are updated to their latest versions.
  • Check your input formatting; specific tokens and sequence structure are crucial.
  • Verify that the model and tokenizer are loaded correctly.
  • Remember to use a machine with sufficient computational power, if necessary.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Performance Benchmarks

The MultiIndicQuestionGenerationSS has shown impressive scores on the IndicQuestionGeneration test sets across various languages. For example:

  • Hindi: 34.42
  • Bengali: 30.38
  • Punjabi: 32.53
  • Others varied similarly.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox