How to Use Bunny: A Lightweight Multimodal Model

Mar 28, 2024 | Educational

Welcome to the world of Bunny! This article will walk you through how to leverage this powerful multimodal model for your projects. Whether you’re a developer or an AI enthusiast, Bunny offers robust capabilities that can enhance your work.

What is Bunny?

Bunny is a family of lightweight yet remarkably potent multimodal models. They provide various plug-and-play vision encoders such as EVA-CLIP and SigLIP, along with dynamic language backbones like Phi-1.5, StableLM-2, Qwen-1.5, and Phi-2. The Bunny-3B model, specifically, is built upon SigLIP and Phi-2, showcasing outstanding performance even when compared to larger models—achieving results similar to those of 13B models.

Getting Started with Bunny

Before you dive into coding, ensure you have the necessary dependencies installed. Here’s how to set up Bunny:

  • Install required dependencies:
    • pip install torch transformers accelerate pillow

Code Snippet to Use Bunny

Now, let’s look at a simple code snippet to get you started with Bunny. The analogy to help visualize this is to think of using a sophisticated cooking recipe. Each ingredient corresponds to a specific function, and when combined correctly, they produce an exquisite dish—just like the code brings together different elements to create outstanding AI outputs.

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings

# Disable some warnings
transformers.logging.set_verbosity_error()
transformers.logging.disable_progress_bar()
warnings.filterwarnings("ignore")

# Set device
torch.set_default_device('cpu')  # or 'cuda'

# Create model
model = AutoModelForCausalLM.from_pretrained(
    'BAAI/Bunny-v1_0-3B',
    torch_dtype=torch.float16,
    device_map='auto',
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    'BAAI/Bunny-v1_0-3B',
    trust_remote_code=True
)

# Text prompt
prompt = "Why is the image funny?"
text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:"
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('image')]
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)

# Image (Sample images can be found in images folder)
image = Image.open('example_2.png')
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)

# Generate
output_ids = model.generate(
    input_ids,
    images=image_tensor,
    max_new_tokens=100,
    use_cache=True
)[0]
print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())

Step-by-Step Breakdown

Let’s break down the code in similar detail to understanding how to assemble a recipe:

  • Importing Libraries: Much like laying out your ingredients, the first lines import all necessary libraries (like torch and transformers).
  • Turning Off Warnings: Just as you would turn off background music when cooking to focus, this code silences unnecessary warnings.
  • Setting the Device: You can choose between ‘cpu’ or ‘cuda’ (NVIDIA’s GPU) as the cooking surface (your computing power) based on your setup.
  • Creating the Model: Here you are essentially preparing your cooking pot. The model is instantiated with specific parameters that dictate how it will function.
  • Defining the Prompt: This represents the dish you’re aiming to prepare—a conversation starter between a user and an AI assistant.
  • Processing the Image: Think of it as chopping your veggies before cooking; you convert images into tensors that the model can understand.
  • Generating Output: The final step is like plating your dish, where the model generates the text that corresponds to the input prompt and image.

Troubleshooting

If you encounter any issues while setting up or using Bunny, consider the following troubleshooting tips:

  • Ensure that all dependencies are correctly installed.
  • Check that the model paths are accurate.
  • Verify that your device is set correctly and has sufficient memory/capacity.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Note

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox