How to Use the Pix2Text-MFR Model for Mathematical Formula Recognition

May 6, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_29_180

The Pix2Text-MFR model is a powerful tool that transforms images of mathematical formulas into LaTeX text representations. Developed using the TrOCR architecture, it enables users to extract and utilize mathematical formulas effectively. Let’s dive into the details of using this model, troubleshooting tips, and more.

Understanding the Pix2Text-MFR Model

Picture your favorite recipe: it has numerous ingredients arranged beautifully. Imagine having a magic box that, when you drop in your raw ingredients (like an image of a mathematical formula), it churns out a perfectly formatted recipe (or LaTeX code) on the other side. This is precisely what the Pix2Text-MFR model does for images of mathematical formulas!

Getting Started with Pix2Text-MFR

To embark on your journey with Pix2Text-MFR, follow these steps:

Method 1: Using the Model Directly

This method allows for immediate use without installing Pix2Text and works with pure formula images. Here’s how:

pip install transformers=4.37.0 pillow optimum[onnxruntime]
from PIL import Image
from transformers import TrOCRProcessor
from optimum.onnxruntime import ORTModelForVision2Seq

processor = TrOCRProcessor.from_pretrained('breezedeus/pix2text-mfr')
model = ORTModelForVision2Seq.from_pretrained('breezedeus/pix2text-mfr', use_cache=False)

image_fps = ['examples/example.jpg', 'examples/42.png', 'examples/0000186.png']
images = [Image.open(fp).convert('RGB') for fp in image_fps]
pixel_values = processor(images=images, return_tensors='pt').pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(f'generated_ids: {generated_ids}, generated_text: {generated_text}')

Method 2: Using Pix2Text

This method requires installing Pix2Text and can recognize both pure and mixed images:

pip install pix2text=1.1
from pix2text import Pix2Text, merge_line_texts

image_fps = ['examples/example.jpg', 'examples/42.png', 'examples/0000186.png']
p2t = Pix2Text.from_config()
outs = p2t.recognize_formula(image_fps)  # recognize pure formula images
outs2 = p2t.recognize('examples/mixed.jpg', file_type='text_formula', return_text=True, save_analysis_res='mixed-out.jpg')  # recognize mixed images
print(outs2)

Method 3: Using the Notebook

For a hands-on experience, you can try the Pix2Text notebook available here.

Performance Considerations

The Pix2Text V1.0 MFR model shows a notable improvement over previous versions, especially under various conditions of formula complexity. The model’s effectiveness can be evidenced by its Character Error Rates (CER) in recognizing formulas.

Examples of Success

Printed Math Formula Images
Handwritten Math Formula Images

In real-world testing, images captured a wide range of mathematical complexities, proving the model’s robustness.

Troubleshooting Tips

If you encounter challenges while using the Pix2Text-MFR model, consider the following tips:

Ensure that your images are clear and well-lit for best recognition results.
Make sure you have installed the correct library versions, especially for dependencies like transformers.
Check the file paths of your images to avoid ‘file not found’ errors.
If you experience slow performance, consider optimizing your images to reduce size and complexity.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox