How to Generate Multiple-Choice Questions from Image Descriptions

Aug 26, 2023 | Educational

Have you ever wanted to assess the accuracy of an image description? With the help of the TIFA model, we can automate the generation of multiple-choice questions based on a given image description. This guide outlines step-by-step instructions on how to implement this, along with troubleshooting tips to ensure a smooth experience.

Understanding the Basics

The TIFA (Text-to-Image Faithfulness evaluation with Question Answering) model is a text parsing and question generation tool. By giving it a description, like “a blue rabbit and a red plane,” you can generate various questions that verify the description’s accuracy. The model analyzes the input to classify concepts into categories such as object, human, animal, and more, and it subsequently creates targeted questions for each type.

QuickStart Guide

Follow these steps to set up and run the question generation model:

1. Clone the Repository

First, you need to clone the TIFA repository from GitHub to get access to the code and necessary modules:

git clone https://github.com/Yushi-Hutifa

2. Set Up the LLaMA 2 Model

The next step involves preparing the LLaMA 2 model for generating questions. You will need to use the following code:

import torch
import transformers

# prepare the LLaMA 2 model
model_name = 'tifa-benchmark/llama2_tifa_question_generation'
pipeline = transformers.pipeline(
    text-generation,
    model=model_name,
    torch_dtype=torch.float16,
    device_map='auto',
)

3. Create the Prompt

Once the model is ready, you can format the prompt in a way that the LLaMA 2 model understands:

def create_qg_prompt(caption):
    INTRO_BLURB = "Given an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\n"
    INTRO_BLURB += "Classify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n"
    formated_prompt = f"[INST] SYS\n{INTRO_BLURB}SYS\n\nDescription: {caption} [INST] Entities:"
    return formated_prompt

test_caption = "a blue rabbit and a red plane"
prompt = create_qg_prompt(test_caption)

4. Generate the Questions

Finally, execute the prompt to generate the questions:

sequences = pipeline(prompt, do_sample=False, num_beams=5, num_return_sequences=1, max_length=512)
output = sequences[0]['generated_text'][len(prompt):]
output = output.split("\n\n")[0]
print(output)

Understanding the Code Through Analogy

Think of the model as a chef preparing a unique dish (question) based on a recipe (image description). Each ingredient (concepts from the description) holds a specific classification (type: object, animal, etc.). When the chef sees the recipe, they gather ingredients, chop them (formatting the prompt), and combine them in a pot (running the model) to serve a delectable dish (output). Each step in this culinary experience represents an important part of the code functionality in generating accurate questions from the description.

Troubleshooting Tips

If you encounter any issues during the process, consider the following troubleshooting steps:

  • Issue: Model not found or loading errors.
    Solution: Ensure that the model name is correctly spelled and that you have internet access to fetch the model.
  • Issue: Errors when generating sequences.
    Solution: Verify that the input prompt is correctly formatted and follows the expected syntax.
  • Issue: Unexpected output.
    Solution: Double-check the caption and ensure it’s a clear, concise description of the image.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With the TIFA model at your fingertips, generating multiple-choice questions from image descriptions is straightforward and efficient. The steps provided here will guide you through the process, enabling you to explore text-to-image faithfulness evaluation with ease.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox