How to Use Bunny-Llama-3-8B-V Model

Jun 27, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_9_230

If you’ve ever wanted to explore the powerful yet lightweight Bunny-Llama-3-8B-V multimodal model, you’re in the right place. This guide will walk you through the steps to get started using this advanced tool with ease.

What is Bunny-Llama-3-8B-V?

Bunny-Llama-3-8B-V is a multifaceted model that harnesses multiple plug-and-play vision encoders along with impressive language backbones. It allows you to analyze and generate text based on various types of input, including high-resolution images. Notably, this version can accept images up to 1152×1152 pixels.

Quickstart Guide

Before jumping into the code, you’ll need to set up your environment by installing necessary packages. You can do this with the following command:

pip install torch transformers accelerate pillow

If you have sufficient CUDA memory, you can utilize the GPU for faster execution by setting CUDA_VISIBLE_DEVICES=0.

Code Example

The code snippet below demonstrates how to use the Bunny-Llama-3-8B-V model with transformers.

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings

# disable some warnings
transformers.logging.set_verbosity_error()
transformers.logging.disable_progress_bar()
warnings.filterwarnings("ignore")

# set device
device = "cuda"  # or "cpu"
torch.set_default_device(device)

# create model
model = AutoModelForCausalLM.from_pretrained(
    "BAAI/Bunny-Llama-3-8B-V",
    torch_dtype=torch.float16,  # float32 for cpu
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "BAAI/Bunny-Llama-3-8B-V",
    trust_remote_code=True
)

# text prompt
prompt = "Why is the image funny?"
text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:"

text_chunks = [tokenizer(chunk).input_ids for chunk in text.split("")]
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1][1:], dtype=torch.long).unsqueeze(0).to(device)

# image, sample images can be found in images folder
image = Image.open("example_2.png")
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)

# generate output
output_ids = model.generate(
    input_ids,
    images=image_tensor,
    max_new_tokens=100,
    use_cache=True,
    repetition_penalty=1.0  # increase this to avoid chattering
)[0]
print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())

Understanding the Code: An Analogy

Imagine you are preparing a delightful meal. The ingredients are your libraries and packages, like torch, transformers, and PIL. Before cooking, you need to gather and sort these ingredients. In our code, we accomplish this by importing the necessary modules.

Just like setting a cooking environment—whether it’s a stove (GPU) or a microwave (CPU)—you select your device. You then bring out your special recipe (the model), which is pretrained and ready to go, similar to how you might have a ready-to-cook meal kit.

The actual cooking process begins with blending your ingredients. Here, the text prompt acts as the recipe, guiding how you combine everything. Then, you introduce your image, much like adding spices to enhance flavor. Finally, when everything simmers together, you generate your output—your delicious dish!

Troubleshooting Tips

Model Loading Issues: If you encounter errors loading the model, check your internet connection or consider using a HuggingFace mirror site.
Memory Errors: Insufficient GPU memory can be an issue. Try reducing the input size or using the model on a CPU.
Warnings or Errors in Output: Ensure that the image file path is correct and that the image format is supported.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

As you dive into using Bunny-Llama-3-8B-V, remember that every step can be enriching to your AI journey. Don’t hesitate to refer to the GitHub repository for more information and tools.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox