Nougat-LaTeX-based Model: A Comprehensive Guide

Feb 28, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_29_135

If you’re venturing into the fascinating realm of image-to-LaTeX conversion, you’re likely to come across the Nougat-LaTeX-based model. This exceptional tool, fine-tuned from the facebook/nougat-base, is designed to effectively generate LaTeX code from images, particularly equations. In this article, we will walk you through the setup and usage, ensuring that you can harness its full potential.

Getting Started with Nougat-LaTeX-based

Before you dive in, it’s essential to establish your environment and prerequisites. Make sure you have the necessary packages installed. You can get started by installing the Transformers library:

pip install transformers==4.34.0

Setting Up the Model

Now, let’s get your hands dirty with some code! To use the Nougat-LaTeX-based model, follow these steps:

Download the repository:

bash
git clone git@github.com:NormX/nougat-latex-ocr.git
cd nougat-latex-ocr

Prepare the inference script:

Load the necessary libraries and initialize the model as illustrated below:

python
import torch
from PIL import Image
from transformers import VisionEncoderDecoderModel
from transformers.models.nougat import NougatTokenizerFast
from nougat_latex import NougatLaTexProcessor

model_name = "Norm/nougat-latex-based"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Init model
model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)

# Init processor
tokenizer = NougatTokenizerFast.from_pretrained(model_name)
latex_processor = NougatLaTexProcessor.from_pretrained(model_name)

Running Inference

Now that you’ve set up the model, it’s time for action! Here’s how to run an inference on your image:

python
# Run test
image = Image.open("path/to/latex/image.png")

if not image.mode == "RGB":
    image = image.convert("RGB")

pixel_values = latex_processor(image, return_tensors="pt").pixel_values
decoder_input_ids = tokenizer(tokenizer.bos_token, add_special_tokens=False, return_tensors="pt").input_ids

with torch.no_grad():
    outputs = model.generate(
        pixel_values.to(device),
        decoder_input_ids=decoder_input_ids.to(device),
        max_length=model.decoder.config.max_length,
        early_stopping=True,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        use_cache=True,
        num_beams=5,
        bad_words_ids=[[tokenizer.unk_token_id]],
        return_dict_in_generate=True,
    )

sequence = tokenizer.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(tokenizer.eos_token, "").replace(tokenizer.pad_token, "").replace(tokenizer.bos_token, "")
print(sequence)

Understanding the Code through Analogy

Think of the Nougat-LaTeX-based model as a chef in a high-tech kitchen, where:

The image is the raw ingredient you want to transform.
The VisionEncoderDecoderModel represents the chef’s unique recipe, guiding the transformation from ingredients (image) to a delicious dish (LaTeX code).
The processor acts as the sous-chef, preparing the ingredients just right before the chef starts cooking.
The output reflects the final dish that is served to the guests (you, who requested the conversion).

This analogy helps highlight how each part collaborates to ensure the final output meets your expectations without any unwanted surprises!

Troubleshooting Tips

While working with the Nougat-LaTeX-based model, you may encounter some issues. Here are a few troubleshooting tips to keep in mind:

If you notice that the inference API widget is cutting responses short, it’s advisable to run the model locally following the steps outlined above.
Check the GitHub issue related to the inference API for any potential updates or fixes.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

The Nougat-LaTeX-based model is a powerful and efficient tool to convert images of equations into LaTeX code. With its easy setup and robust model architecture, you’ll be able to tackle your image-to-LaTeX conversion tasks seamlessly.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox