How to Use Bunny: A Powerful Multimodal Model

Mar 31, 2024 | Educational

Welcome to the world of Bunny, a family of lightweight yet powerful multimodal models perfect for a variety of applications! In this guide, we’ll walk you through the quick start process to using Bunny, how to implement it in your projects, and troubleshoot any potential issues you may encounter along the way.

What is Bunny?

Bunny’s architecture features an array of vision encoders and language backbones, enabling it to efficiently process multimodal inputs. This model is particularly impressive due to its ability to outperform larger models while maintaining a relatively smaller footprint. Depending on your needs, you can leverage various components like EVA-CLIP, SigLIP, Phi-1.5, and more.

Quickstart Guide

Let’s get you set up with Bunny! To start, you need to ensure you have the necessary dependencies installed.

Step 1: Installation

pip install torch transformers accelerate pillow

Step 2: Import Libraries

Now, let’s import the required libraries in your Python script:

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings

Step 3: Set Device

You can choose to run the model on CPU or GPU. Here’s how to set it:

torch.set_default_device('cpu')  # or 'cuda'

Step 4: Create Model

Next, instantiate your Bunny model:

model = AutoModelForCausalLM.from_pretrained(
    "BAAI/Bunny-v1_0-3B",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(
    "BAAI/Bunny-v1_0-3B",
    trust_remote_code=True)

Step 5: Prepare Input

All set! Now, let’s prepare your text prompt and image input:

prompt = "Why is the image funny?"
text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:"

image = Image.open("example_2.png")

Generating Output

Now, you can generate an output based on your input!

input_ids = tokenizer(text).input_ids
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)

output_ids = model.generate(
    input_ids=torch.tensor(input_ids).unsqueeze(0),
    images=image_tensor,
    max_new_tokens=100,
    use_cache=True
)[0]

print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())

Understanding the Code with an Analogy

Think of Bunny as a chef preparing a multi-course meal. The chef has different tools in the kitchen (various plug-and-play vision encoders and language backbones) and can prepare small yet delicious dishes (lightweight model). In order to get started, the chef needs ingredients (dependencies), and after everything is prepared, they can combine their culinary skills (model capabilities) to create a unique dining experience (generating output based on multimodal input). Just as a successful meal requires planning and quality ingredients, utilizing Bunny efficiently requires proper setup and preparation.

Troubleshooting

If you run into any issues, here are a few troubleshooting ideas:

Check if all dependencies are installed correctly without version conflicts.
Ensure that your model weights and tokenizer paths are correctly specified.
If you encounter a memory error, try switching your device to CPU or optimizing your batch size.
For warnings during model processing, consider reviewing the input data format and adequacy.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

And there you have it, a full walkthrough to get started with Bunny. Happy coding and enjoy harnessing the power of Bunny for your AI projects!

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox