Welcome to your step-by-step guide on leveraging the TFT-ID (TableFigureText IDentifier), a powerful model finetuned specifically to extract tables, figures, and text sections from academic papers. Developed by Yifei Hu, this model is based on the microsoft/Florence-2-large architecture and promises impressive accuracy. Let’s dive in!
Understanding the TFT-ID Model
Imagine you have a library of academic papers, and your friend loves to extract relevant data from them. To help your friend, you’ve built a specialized robot: the TFT-ID. This robot can scan each paper page, find all tables, figures, and relevant text sections, and neatly box them up for further analysis. Like a librarian guiding you through a complex library, TFT-ID makes extracting data straightforward.
Key Features of TFT-ID
- High Accuracy: The model achieves a success rate of 96.78% for identifying tables, figures, and text sections.
- Manual Annotations: It uses 36,000+ manually annotated bounding boxes for improved results.
- Optimized for OCR: Outputs are well-suited for Optical Character Recognition workflows, especially when paired with TB-OCR-preview-0.1 for clean markdown and LaTeX formats.
Getting Started with TFT-ID
To start utilizing the TFT-ID model, follow these steps in your Python environment:
python
import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("yifeihu/TFT-ID-1.0", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("yifeihu/TFT-ID-1.0", trust_remote_code=True)
prompt = "OD"
url = "https://huggingface.co/yifeihu/TFT-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
do_sample=False,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(generated_text, task="OD", image_size=(image.width, image.height))
print(parsed_answer)
Analyzing the Code
Let’s break down the code you’ve just seen by comparing it to preparing a fantastic dish:
- Ingredients: First, you gather all your ingredients, which in this case are the libraries and modules you need (like PIL and transformers).
- Preparation: Next, you retrieve the main dish (the TFT-ID model) and the necessary components (the processor) from the pantry (Hugging Face).
- Cooking: Similar to cooking, you mix all ingredient elements together – feeding an image and text prompt into the model to get predictions, calling it the ‘cooking’ phase.
- Plating: Finally, you serve your dish by printing the parsed output on your plate, ready for everyone to enjoy!
Troubleshooting Tips
While using TFT-ID, you might encounter a few hiccups along the way. Here are some troubleshooting ideas to overcome potential issues:
- Model Errors: If the model throws errors about libraries, make sure you have all the necessary packages installed and that they are up to date.
- Image Not Loading: Verify the URL is correct and accessible; sometimes, a simple typographical error can cause loading issues.
- Output Not as Expected: If results are not aligning with anticipated outputs, ensure that the image quality is good. A clear, high-resolution image typically yields better results.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
With the TFT-ID model in your toolkit, you’ll find navigating the rich forest of academic papers much easier. This helps not just in research, but also encourages efficient data analysis.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.