Are you ready to dive into the world of image classification using the MobileNet-v3 model? This guide will walk you through the process of utilizing the MobileNet-v3 small model (tf_mobilenetv3_small_075.in1k) from the TIMM library. Get ready to classify images with ease!
Understanding the Model
The MobileNet-v3 is a lightweight convolutional neural network architecture designed for efficient image classification. It has been trained on the ImageNet-1k dataset. Imagine this model as a smart assistant in a library, capable of sifting through thousands of images and fetching the right book (or category) for you in a matter of seconds!
Model Details
- Model Type: Image classification feature backbone
- Parameters: 2.0 Million
- GMACs: 0.0
- Activations: 1.3 Million
- Image Size: 224 x 224
- Papers: Searching for MobileNetV3
- Original Model: EfficientNet on GitHub
How to Classify Images
To classify images using the MobileNet-v3 model, follow these steps:
python
from urllib.request import urlopen
from PIL import Image
import timm
# Load the image
img = Image.open(urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
# Create the model
model = timm.create_model('tf_mobilenetv3_small_075.in1k', pretrained=True)
model = model.eval()
# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
# Perform inference
output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into a batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
Explaining the Code with an Analogy
Think of the above code as a recipe for baking a delicious cake. Here’s how it works:
- Ingredients (Libraries): Just like you need flour, sugar, and eggs to bake, the code imports essential libraries such as
Imagefrom PIL for image processing andtimmfor accessing models. - Gathering Supplies (Loading the Image): The cake can’t be baked without gathering the right ingredients. Here, the code fetches an image from a URL, just like you would pick your ingredients from the pantry.
- Choosing a Recipe (Creating the Model): The recipe (model) is created by specifying the type of cake you want to bake (loading MobileNet-v3). It’s pre-prepared (pretrained) for you!
- Preparing the Cake Pan (Transforms): Before baking, you must prepare the cake pan. Here, the transforms handle the image adjustments such as normalization and resizing to make it suitable for the model.
- Baking (Inference): Finally, you pour the mixture into the pan and bake. The model runs the image through its layers, and we get the delicious result—the top 5 predictions!
Feature Map Extraction
To extract feature maps, you can adjust the code slightly:
python
# Modifications to extract feature maps
model = timm.create_model('tf_mobilenetv3_small_075.in1k', pretrained=True, features_only=True)
model = model.eval()
output = model(transforms(img).unsqueeze(0)) # Unsqueeze single image into a batch of 1
for o in output:
print(o.shape) # Print shape of each feature map in output
Generating Image Embeddings
To generate image embeddings, a slight variation is required:
python
# Modifications to get embeddings
model = timm.create_model('tf_mobilenetv3_small_075.in1k', pretrained=True, num_classes=0) # Remove classifier
model = model.eval()
output = model(transforms(img).unsqueeze(0)) # Output is a (batch_size, num_features) shaped tensor
output = model.forward_features(transforms(img).unsqueeze(0)) # Output is unpooled
output = model.forward_head(output, pre_logits=True) # Output is a (1, num_features) shaped tensor
Troubleshooting Tips
If you encounter any issues while implementing this model, consider the following troubleshooting tips:
- Ensure that all necessary libraries are installed. Use
pip install timm pillow torchto install them. - If you receive any errors related to image loading, double-check the URL and be sure it points to a valid image.
- For discrepancies in output shapes, verify that the transforms applied to the image are correct.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
In conclusion, the MobileNet-v3 model from the TIMM library is a powerful and efficient tool for image classification. With this guide, you should now be equipped to tackle image classification tasks, extract features, and generate embeddings. Now, go ahead and classify images like a pro!
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

