How to Use the MobileNet-v3 Image Classification Model with TIMM

Apr 30, 2023 | Educational

Are you ready to dive into the world of image classification using the MobileNet-v3 model? This guide will walk you through the process of utilizing the MobileNet-v3 small model (tf_mobilenetv3_small_075.in1k) from the TIMM library. Get ready to classify images with ease!

Understanding the Model

The MobileNet-v3 is a lightweight convolutional neural network architecture designed for efficient image classification. It has been trained on the ImageNet-1k dataset. Imagine this model as a smart assistant in a library, capable of sifting through thousands of images and fetching the right book (or category) for you in a matter of seconds!

Model Details

How to Classify Images

To classify images using the MobileNet-v3 model, follow these steps:

python
from urllib.request import urlopen
from PIL import Image
import timm

# Load the image
img = Image.open(urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))

# Create the model
model = timm.create_model('tf_mobilenetv3_small_075.in1k', pretrained=True)
model = model.eval()

# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

# Perform inference
output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into a batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Explaining the Code with an Analogy

Think of the above code as a recipe for baking a delicious cake. Here’s how it works:

  • Ingredients (Libraries): Just like you need flour, sugar, and eggs to bake, the code imports essential libraries such as Image from PIL for image processing and timm for accessing models.
  • Gathering Supplies (Loading the Image): The cake can’t be baked without gathering the right ingredients. Here, the code fetches an image from a URL, just like you would pick your ingredients from the pantry.
  • Choosing a Recipe (Creating the Model): The recipe (model) is created by specifying the type of cake you want to bake (loading MobileNet-v3). It’s pre-prepared (pretrained) for you!
  • Preparing the Cake Pan (Transforms): Before baking, you must prepare the cake pan. Here, the transforms handle the image adjustments such as normalization and resizing to make it suitable for the model.
  • Baking (Inference): Finally, you pour the mixture into the pan and bake. The model runs the image through its layers, and we get the delicious result—the top 5 predictions!

Feature Map Extraction

To extract feature maps, you can adjust the code slightly:

python
# Modifications to extract feature maps
model = timm.create_model('tf_mobilenetv3_small_075.in1k', pretrained=True, features_only=True)
model = model.eval()

output = model(transforms(img).unsqueeze(0))  # Unsqueeze single image into a batch of 1
for o in output:
    print(o.shape)  # Print shape of each feature map in output

Generating Image Embeddings

To generate image embeddings, a slight variation is required:

python
# Modifications to get embeddings
model = timm.create_model('tf_mobilenetv3_small_075.in1k', pretrained=True, num_classes=0)  # Remove classifier
model = model.eval()

output = model(transforms(img).unsqueeze(0))  # Output is a (batch_size, num_features) shaped tensor
output = model.forward_features(transforms(img).unsqueeze(0))  # Output is unpooled
output = model.forward_head(output, pre_logits=True)  # Output is a (1, num_features) shaped tensor

Troubleshooting Tips

If you encounter any issues while implementing this model, consider the following troubleshooting tips:

  • Ensure that all necessary libraries are installed. Use pip install timm pillow torch to install them.
  • If you receive any errors related to image loading, double-check the URL and be sure it points to a valid image.
  • For discrepancies in output shapes, verify that the transforms applied to the image are correct.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

In conclusion, the MobileNet-v3 model from the TIMM library is a powerful and efficient tool for image classification. With this guide, you should now be equipped to tackle image classification tasks, extract features, and generate embeddings. Now, go ahead and classify images like a pro!

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox