How to Use EfficientNet-v2 for Image Classification with PyTorch

Apr 29, 2023 | Educational

EfficientNet-v2 is a state-of-the-art image classification model that has been trained on a large dataset, making it a potent tool for various image classification tasks. In this guide, we will take you through the process of using the EfficientNet-v2 model for image classification, feature map extraction, and generating image embeddings, using the beignets image as our sample input.

Model Overview

The EfficientNet-v2 model is optimized for performance and accuracy. It contains:

Model Type: Image classification feature backbone
Parameters: 208.1 million
GMACs: 52.8
Activations: 139.2 million
Image Size: 384 x 384 for training, 512 x 512 for testing

For further technical details, you can consult the original research paper.

Model Usage

Let’s get started with the model implementation using Python.

1. Image Classification

Here’s how you can classify an image using the EfficientNet-v2 model:

python
from urllib.request import urlopen
from PIL import Image
import timm
import torch

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model('tf_efficientnetv2_xl.in21k_ft_in1k', pretrained=True)
model = model.eval()

# Get model-specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

2. Feature Map Extraction

If you’re interested in extracting feature maps from the model, here’s how:

python
from urllib.request import urlopen
from PIL import Image
import timm
import torch

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model('tf_efficientnetv2_xl.in21k_ft_in1k', pretrained=True, features_only=True)
model = model.eval()

# Get model-specific transforms
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # Unsqueeze single image into batch of 1
for o in output:
    print(o.shape)  # Prints the shape of each feature map

3. Image Embeddings

To generate image embeddings, use the following code:

python
from urllib.request import urlopen
from PIL import Image
import timm
import torch

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model('tf_efficientnetv2_xl.in21k_ft_in1k', pretrained=True, num_classes=0)  # Remove classifier
model = model.eval()

# Get model-specific transforms
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # Output shape is (batch_size, num_features)
output = model.forward_features(transforms(img).unsqueeze(0))  # Output is unpooled
output = model.forward_head(output, pre_logits=True)  # Output is a (1, num_features) shaped tensor

Understanding the Model’s Inner Workings: An Analogy

Think of the EfficientNet-v2 model as a skilled chef baking a cake. The chef (model) has various tools (layers) and ingredients (data) that they use to create a delicious cake (output). The cake goes through different stages such as mixing (convolution), baking (activation), and decorating (pooling). Each stage helps enhance the final cake’s flavor (accuracy) and presentation (features). Like a chef who perfects their recipe over time, the EfficientNet-v2 model refines its method through training on large datasets.

Troubleshooting

If you encounter issues while running the model, consider the following steps:

Ensure all necessary libraries (like timm, torch, and PIL) are properly installed.
Check your internet connection if you are having trouble loading the image.
Confirm that you are using the correct model name in the code.
Make sure your image is accessible and in a supported format.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

The EfficientNet-v2 model provides robust tools for image classification and feature extraction. By following this guide, you can leverage this powerful model for your own projects.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox