Welcome to the world of image classification! In this guide, we will explore how to use the MobileNet-v2 image classification model trained on the ImageNet-1k dataset using the timm library. By the end of this article, you will learn how to classify images, extract feature maps, and generate embeddings effectively.
What is MobileNet-v2?
MobileNet-v2 is an advanced convolutional neural network architecture that uses depth-wise separable convolutions to build lightweight models designed for mobile and edge devices. In essence, it’s like a Swiss Army knife optimized for image classification tasks!
Setting Up Your Environment
Before diving into the code, ensure you have the necessary libraries installed. You’ll need:
- PIL for image handling
- timm for interaction with the MobileNet model
- torch for handling tensor operations
You can install these using pip:
pip install Pillow timm torch
Using the MobileNet-v2 Model for Image Classification
Now that your environment is ready, let’s jump into classifying images using MobileNet-v2.
Imagine you have a well-trained dog that can recognize various breeds at a glance. Similarly, MobileNet-v2 recognizes features in images to output classifications. Here’s how we can achieve that:
from urllib.request import urlopen
from PIL import Image
import timm
import torch
# Load your image
img = Image.open(urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
# Load the model
model = timm.create_model('mobilenetv2_120d.ra_in1k', pretrained=True)
model = model.eval()
# Get model specific transforms
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
# Classify the image
output = model(transforms(img).unsqueeze(0)) # unsqueeze to create a batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
Extracting Feature Maps
Sometimes, you may want to peek under the hood and see what features the model is focusing on. Feature maps serve a similar function to the details on a diagnostic scan—they reveal the inner workings of the model.
model = timm.create_model('mobilenetv2_120d.ra_in1k', pretrained=True, features_only=True)
model = model.eval()
# Classifying the image for feature maps
output = model(transforms(img).unsqueeze(0)) # Batch of 1
for o in output:
print(o.shape) # Print shape of each feature map
Generating Image Embeddings
Let’s say you need to retrieve a unique fingerprint of the image—something that captures quirkiness and style at a macroscopic level. That’s what image embeddings are for!
model = timm.create_model('mobilenetv2_120d.ra_in1k', pretrained=True, num_classes=0)
model = model.eval()
# Get the embeddings for the image
output = model.forward_features(transforms(img).unsqueeze(0)) # Clipped output
output = model.forward_head(output, pre_logits=True) # (1, num_features) shaped tensor
Troubleshooting Common Issues
When working with image classification tasks, you might face a few hiccups. Here are some troubleshooting tips:
- Image Not Loading: Ensure the URL used in `urlopen` is correct and accessible.
- Model Fails to Load: Check if you have the latest version of the timm library. Upgrading your library could solve many issues.
- Output Shapes Not Matching: Make sure your input images are processed with the right transformations as expected by the model.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
Congratulations! You’ve taken a deep dive into image classification using MobileNet-v2. With this knowledge, you can command your models to understand and classify images adeptly.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

