TFT-ID: Your Go-To Tool for Identifying Tables, Figures, and Texts in Academic Papers

Jul 30, 2024 | Educational

If you’ve ever tried to sift through an academic paper and locate specific tables, figures, or sections of text, you know how challenging it can be! Enter TFT-ID: Table/Figure/Text Identifier, an ingenious model designed by Yifei Hu to effortlessly parse these elements from academic articles. In this guide, we’ll explore how to get started with TFT-ID, its functionality, and how you can troubleshoot common issues.

What is TFT-ID?

TFT-ID is an object detection model fine-tuned to recognize and extract tables, figures, and text sections from academic papers. Built on the microsoft/Florence-2 checkpoints, it has been trained on a dataset of over 36,000 annotated bounding boxes, ensuring a high success rate and accuracy.

How Does TFT-ID Work? An Analogy

Imagine you’re at a library, and you’ve been tasked with finding all pieces of information about a particular topic within a stack of books. While it might take you hours or even days to sift through the pages manually, TFT-ID acts like a highly efficient librarian. It doesn’t just hand you the books; it opens them up, flipping through pages, and highlights every table, figure, and paragraph related to your topic, saving you precious time and making your research process a breeze.

Getting Started with TFT-ID

To use the TFT-ID model, you’ll need to follow the steps below:

Install necessary libraries (if you haven’t yet).
Use the following Python code to load your model!

import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("yifeihu/TFT-ID-1.0", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("yifeihu/TFT-ID-1.0", trust_remote_code=True)

prompt = ""
url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    do_sample=False,
    num_beams=3
)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(generated_text, task="", image_size=(image.width, image.height))
print(parsed_answer)

For more visualization and detailed steps, you can refer to this tutorial notebook.

Troubleshooting Common Issues

If you encounter any challenges while using the TFT-ID model, consider the following troubleshooting tips:

Model Not Loading: Ensure you have the correct libraries installed and that you are using the proper environment for running the code.
Incorrect Bounding Boxes: Some outputs might seem incorrect, but they can still be usable. The model may draw multiple boxes for components of a single figure.
Performance Issues: Make sure your device has sufficient memory and processing power for the model you’re using.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With TFT-ID, the once daunting task of identifying tables, figures, and text sections in academic papers has transformed into a straightforward process. This tool is a perfect companion for researchers and students alike, allowing them to focus on their core work without getting bogged down by the minutiae of document navigation.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox