How to Utilize MobileCLIP: A Comprehensive Guide

Jul 24, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_1_192

MobileCLIP is a robust framework designed to enhance the synergy between images and text through multi-modal reinforced training. This blog post will guide you through the usage of MobileCLIP, including installation and inference steps.

What is MobileCLIP?

MobileCLIP stands for Mobile Contrastive Language-Image Pre-training. It’s a cutting-edge approach providing faster, more efficient models for image and text pairing. The smallest variant, MobileCLIP-S0, offers competitive performance compared to OpenAI’s ViT-B16, yet is remarkably lighter and quicker.

Highlights of MobileCLIP

The MobileCLIP-S0 variant is 4.8x faster and 2.8x smaller than OpenAI’s model.
MobileCLIP-S2 exceeds the performance of SigLIP’s ViT-B16 model while consuming fewer resources.
MobileCLIP-B(LT) achieves an impressive zero-shot ImageNet performance of 77.2%.

Checkpoints Available

Here are the different model checkpoints available:

Model	# Seen Samples (B)	# Params (M)	Latency (ms)	IN-1k Zero-Shot Top-1 Acc. (%)
MobileCLIP-S0	13	11.4 + 42.4	67.8	58.1
MobileCLIP-S1	13	21.5 + 63.4	72.6	61.3
MobileCLIP-S2	13	35.7 + 63.4	74.4	63.7
MobileCLIP-B	13	86.3 + 63.4	76.8	65.2
MobileCLIP-B (LT)	36	86.3 + 63.4	77.2	65.8

How to Use MobileCLIP

To get started with MobileCLIP, follow these simple steps:

Download the desired checkpoint by clicking one of the links in the table above.
Navigate to the “Files and versions” tab to download the PyTorch checkpoint.
If you prefer a programmatic approach, make sure you have the huggingface_hub installed, and run the following command:

huggingface-cli download pcuenqMobileCLIP-B-LT

Install the ml-mobileclip library using the instructions provided in their repository.
Now you’re ready to run inference! Use the following code snippet:

import torch
from PIL import Image
import mobileclip
model, _, preprocess = mobileclip.create_model_and_transforms(mobileclip_blt, pretrained='path_to_mobileclip_blt.pt')
tokenizer = mobileclip.get_tokenizer(mobileclip_blt)
image = preprocess(Image.open('docs/fig_accuracy_latency.png').convert('RGB')).unsqueeze(0)
text = tokenizer(['a diagram', 'a dog', 'a cat'])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features = image_features.norm(dim=-1, keepdim=True)
    text_features = text_features.norm(dim=-1, keepdim=True)
    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print('Label probs:', text_probs)

Understanding the Code: An Analogy

Think of the code snippet as a digital chef preparing a dish. Each portion represents a step in the cooking process:

Importing libraries is like gathering ingredients. You need the right tools to prepare your meal (in this case, the model and processing functions).
Loading the model and image is akin to preheating the oven and setting out your ingredients.
Processing the image can be compared to chopping vegetables, ensuring everything is ready to be mixed.
Encoding image and text represents cooking the dish; you combine your ingredients to create something delicious!
Lastly, softmax serves the final product—the dish is now ready to be enjoyed and sampled (outputting probabilities for the labels).

Troubleshooting Tips

If you’re facing any hurdles while using MobileCLIP, here are some troubleshooting ideas:

Ensure that all dependencies are correctly installed.
If you encounter errors while downloading the checkpoint, double-check the URLs provided in the checkpoint table.
If the model fails to run, verify that you’re using the correct paths and syntax in your code.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With MobileCLIP, the world of image-text models is at your fingertips. By following these steps, you can effectively utilize this advanced technology to enhance your AI projects. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox