Fine-tuning TAPAS for Table Question Answering: A How-To Guide

Aug 10, 2021 | Educational

In the realm of artificial intelligence, the ability to extract information from structured tables is rapidly becoming indispensable. TAPAS, a BERT-like transformer model, is a key player in this domain, designed specifically for answering questions regarding tabular data. In this article, we will explore how to fine-tune TAPAS on the WikiTable Questions (WTQ) dataset and utilize it for table question answering.

Understanding TAPAS: The Foundation

TAPAS is pretrained on vast amounts of English data sourced from Wikipedia, utilizing a self-supervised learning approach. This means it learns from tables and their associated text without requiring human annotation. The model incorporates two main pretraining objectives:

Masked Language Modeling (MLM): The model randomly masks 15% of the words in a flattened table and the corresponding context, training itself to predict those masked words. Unlike traditional approaches that process words sequentially, TAPAS learns to understand language bidirectionally.
Intermediate Pre-training: To develop numerical reasoning capabilities, a balanced dataset of syntactically created examples is used. Here, the model classifies whether a statement is supported by the table’s contents, learning from both synthetic and counterfactual examples.

Fine-tuning TAPAS on the WTQ Dataset

After pretraining, TAPAS is fine-tuned sequentially on datasets including SQA, WikiSQL, and finally the WTQ dataset. This empowers TAPAS to effectively address questions about tables.

How to Use TAPAS for Table Question Answering

Now that we understand the foundation, let’s walk through how to load the TAPAS model and make predictions.

python
from transformers import AutoModelForTableQuestionAnswering, AutoTokenizer, pipeline

# Load model and tokenizer
tapas_model = AutoModelForTableQuestionAnswering.from_pretrained("navtecatapas-large-finetuned-wtq")
tapas_tokenizer = AutoTokenizer.from_pretrained("navtecatapas-large-finetuned-wtq")

# Get predictions
nlp = pipeline("table-question-answering", model=tapas_model, tokenizer=tapas_tokenizer)
result = nlp(
    table=[
        {"Repository": ["Transformers", "Datasets", "Tokenizers"]},
        {"Stars": [36542, 4512, 3934]},
        {"Contributors": [651, 77, 34]},
        {"Programming language": ["Python", "Python", "Rust", "Python and NodeJS"]}
    ],
    query="How many stars does the transformers repository have?"
)
print(result)

Explaining the Code: A Garden Analogy

Imagine you are a gardener managing different plants (data in a table) that need careful attention (a question). The TAPAS model acts like a master gardener who knows how to care for each plant effectively. Here’s how:

First, you gather your tools (load the model and tokenizer) to prepare for the task ahead.
Then, you create a well-organized garden layout (the table format) where each type of plant is in its designated area (key-value pairs).
Finally, you provide guidance (input a query) to determine how to best nurture (answer) a specific plant (data) in your garden.

This structured approach helps TAPAS to pinpoint the answer efficiently, just like knowing where to look for each plant’s needs in a well-planned garden.

Troubleshooting Guide

While using TAPAS for table question answering, you might encounter some challenges along the way. Here are a few troubleshooting tips:

Model Loading Issues: Ensure that your internet connection is stable, as the model is loaded from Hugging Face’s repository. If issues persist, try clearing your cache or reinstalling the transformers library.
Incorrect Answers: Check your data format in the table. Make sure it follows the list of dictionaries format used in the example. A mismatch in structure may lead to incorrect predictions.
Runtime Errors: If you face runtime errors, ensure that all dependencies are installed correctly, and verify your Python environment is set up appropriately.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With TAPAS, you have a powerful tool at your disposal for extracting knowledge from tabular data. As you get acquainted with its functionalities, you’ll realize its potential to transform how we interact with structured information.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox