Welcome to the world of Bunny, a family of lightweight yet powerful multimodal models perfect for a variety of applications! In this guide, we’ll walk you through the quick start process to using Bunny, how to implement it in your projects, and troubleshoot any potential issues you may encounter along the way.
What is Bunny?
Bunny’s architecture features an array of vision encoders and language backbones, enabling it to efficiently process multimodal inputs. This model is particularly impressive due to its ability to outperform larger models while maintaining a relatively smaller footprint. Depending on your needs, you can leverage various components like EVA-CLIP, SigLIP, Phi-1.5, and more.
Quickstart Guide
Let’s get you set up with Bunny! To start, you need to ensure you have the necessary dependencies installed.
Step 1: Installation
pip install torch transformers accelerate pillow
Step 2: Import Libraries
Now, let’s import the required libraries in your Python script:
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings
Step 3: Set Device
You can choose to run the model on CPU or GPU. Here’s how to set it:
torch.set_default_device('cpu') # or 'cuda'
Step 4: Create Model
Next, instantiate your Bunny model:
model = AutoModelForCausalLM.from_pretrained(
"BAAI/Bunny-v1_0-3B",
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
"BAAI/Bunny-v1_0-3B",
trust_remote_code=True)
Step 5: Prepare Input
All set! Now, let’s prepare your text prompt and image input:
prompt = "Why is the image funny?"
text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:"
image = Image.open("example_2.png")
Generating Output
Now, you can generate an output based on your input!
input_ids = tokenizer(text).input_ids
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)
output_ids = model.generate(
input_ids=torch.tensor(input_ids).unsqueeze(0),
images=image_tensor,
max_new_tokens=100,
use_cache=True
)[0]
print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
Understanding the Code with an Analogy
Think of Bunny as a chef preparing a multi-course meal. The chef has different tools in the kitchen (various plug-and-play vision encoders and language backbones) and can prepare small yet delicious dishes (lightweight model). In order to get started, the chef needs ingredients (dependencies), and after everything is prepared, they can combine their culinary skills (model capabilities) to create a unique dining experience (generating output based on multimodal input). Just as a successful meal requires planning and quality ingredients, utilizing Bunny efficiently requires proper setup and preparation.
Troubleshooting
If you run into any issues, here are a few troubleshooting ideas:
- Check if all dependencies are installed correctly without version conflicts.
- Ensure that your model weights and tokenizer paths are correctly specified.
- If you encounter a memory error, try switching your device to CPU or optimizing your batch size.
- For warnings during model processing, consider reviewing the input data format and adequacy.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
And there you have it, a full walkthrough to get started with Bunny. Happy coding and enjoy harnessing the power of Bunny for your AI projects!
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
