Your Guide to Using TinyLLaVA for Image-Text Tasks

May 22, 2024 | Educational

Welcome to the world of TinyLLaVA, a family of small-scale Large Multimodal Models (LMMs) that are making waves in the realm of artificial intelligence. TinyLLaVA releases models ranging from 1.4B to 3.1B parameters, and even the smallest models can outperform existing larger models. Today, we’ll explore how to use the TinyLLaVA-Gemma-SigLIP-2.4B model effectively. Grab your metaphorical toolbox, and let’s get started!

Getting Started with TinyLLaVA

Before diving into the code, ensure you have access to the required model at googlegemma-2b-it. In your Python environment, you will use the Hugging Face Transformers library to load the model and generate outputs.

Usage Instructions

Here is a simple sequence of steps to set up and run the TinyLLaVA-Gemma-SigLIP-2.4B model:

python
from transformers import AutoTokenizer, AutoModelForCausalLM

hf_path = "tinyllava/TinyLLaVA-Gemma-SigLIP-2.4B"
model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True)
model.cuda()
config = model.config
tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False,
                                           model_max_length=config.tokenizer_model_max_length,
                                           padding_side=config.tokenizer_padding_side)

prompt = "What are these?"
image_url = "http://images.cocodataset.org/test-stuff2017000000000001.jpg"

output_text, generation_time = model.chat(prompt=prompt, image=image_url, tokenizer=tokenizer)

print("model output:", output_text)
print("running time:", generation_time)

Breaking Down the Code: An Analogy

Think of loading and running the TinyLLaVA model like preparing a meal in a kitchen:

  • Gathering Ingredients: Just like you need to gather your ingredients (in our case, the model and tokenizer), you use the from_pretrained method to load them into your workspace.
  • Cooking with Heat: When you bring your ingredients to life by cooking, here we utilize the model.cuda() to shift our model to the GPU, enhancing processing speed.
  • Following a Recipe: As with following a recipe step by step, you set up prompts and image URLs that instruct the model about what you want, just like specifying what dish you are creating!
  • Tasting: Finally, as you taste your meal, you check the model’s output and generation time to ensure everything is cooking perfectly!

Understanding the Results

The results you get reflect various performance metrics. Here’s how TinyLLaVA-Gemma-SigLIP-2.4B compares with some of its peers:


model_name                           vqav2  gqa  sqa  textvqa  MM-VET  POPE  MME   MMMU
-----------------------------------------------------------------------------------------
[LLaVA-1.5-7B](https://huggingface.co/lava-hf/lava-1.5-7b-hf)   78.5   62.0  66.8   58.2   30.5   85.9  1510.7    -
[bczhou/TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B)  79.9   62.0  69.1   59.1   32.0   86.4  1464.9    -
[tinyllava/TinyLLaVA-Gemma-SigLIP-2.4B](https://huggingface.co/tinyllava/TinyLLaVA-Gemma-SigLIP-2.4B)  78.4  61.6  64.4  53.6  26.9  86.4   1339.0  31.7
[tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B](https://huggingface.co/tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B)  80.1  62.1  73.0  60.3  37.5  87.2  1466.4  38.4

Troubleshooting

If you encounter issues such as model loading failures or unexpected outputs, here are some troubleshooting tips:

  • Ensure your environment is set up properly with the right dependencies. Check the versions of Hugging Face Transformers and PyTorch.
  • Verify the URL of the image is correct and accessible. Broken links can halt your program before it runs.
  • Inspect your prompt to ensure it’s clear and contextually relevant to the images you are providing.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox