Welcome to our user-friendly guide on Lutece-Vision-Base, a specialized Vision-Language Model (VLM) designed to analyze financial documents and answer related questions. Whether you’re a finance professional, a data scientist, or simply curious, this guide is for you!
Model Overview
Lutece-Vision-Base, inspired by the ancient name of Paris, is a fine-tuned model based on the Microsoft Florence framework. It’s specifically created for interpreting financial documents effectively. Let’s break down the components:
- Base Model: microsoftFlorence-2-base-ft
- Training Dataset: sujet-aiSujet-Finance-QA-Vision-100k
- Training Data: 100,629 QA pairs across 9,212 images
- Language: English
- License: MIT
Training the Model
The Lutece-Vision-Base model was rigorously fine-tuned with the following specifications:
- Epochs: 7
- Learning Rate: 1e-6
- Optimizer: AdamW
- Hardware: One NVIDIA A100 GPU
- Training Duration: Approximately 38 hours
Think of training this model like teaching a child—every piece of data represents a lesson, and with each lesson, the child becomes more adept at understanding and responding correctly. Aim for patience, practice, and adaptation during training to perfect the model’s capabilities!
Performance Evaluation
To gauge how well Lutece-Vision-Base performs, we employed two evaluation strategies:
- GPT-4o Evaluation: This method assesses the generated answers through a comparison with a baseline model, validating accuracy and relevance.
- Cosine Similarity Measurement: This approach quantifies how closely the model-generated answers align with the ground truth.
How to Use Lutece-Vision-Base
Let’s walk through utilizing this model with a simple command.
python
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM, AutoConfig
import torch
# Load and configure the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AutoConfig.from_pretrained("microsoftFlorence-2-base-ft", trust_remote_code=True)
config.vision_config.model_type = "davit"
model = AutoModelForCausalLM.from_pretrained("sujet-aiLutece-Vision-Base", config=config, trust_remote_code=True).to(device).eval()
processor = AutoProcessor.from_pretrained("sujet-aiLutece-Vision-Base", config=config, trust_remote_code=True)
# Load input image and define the question
image = Image.open("test.png").convert("RGB")
prompt = "How much decrease in prepaid expenses was reported?"
# Process input and generate answer
inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
do_sample=False,
num_beams=3,
)
# Decode and parse the answer
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(generated_text, task="FinanceQA", image_size=(image.width, image.height))
print(parsed_answer["task"])
In the code provided, you are setting the stage for your own financial document analysis! Picture this like preparing a gourmet meal—you must gather your ingredients (the model, image, and prompt) and follow the recipe (code) to create a delicious outcome (the answer).
Troubleshooting
If you encounter issues while using the model, consider the following ideas:
- Ensure that the image path is correct; a common error is the image not being found.
- Double-check model installation to confirm that you have all the required libraries properly configured.
- If your answers seem off, try adjusting the input prompt or providing a clearer image.
- Ensure you have a compatible machine with sufficient GPU support for optimal performance.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Limitations and Disclaimer
While Lutece-Vision-Base has extensive training on diverse financial documents, it may not cover every possible scenario. Therefore, always verify critical information and use your discernment when making financially significant decisions based on its output.
Disclaimer: Sujet AI offers Lutece-Vision-Base without warranties. We advise users to exercise their judgment when using the model’s outputs.
Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.