How to Use the ConvNeXt Image Classification Model

Feb 12, 2024 | Educational

The ConvNeXt model is a powerful tool for image classification, pretrained on the ImageNet-22k dataset and further fine-tuned on ImageNet-1k. In this article, we’ll walk you through the process of using this model for various tasks such as image classification, feature map extraction, and generating image embeddings. Ready to dive in? Let’s get started!

Model Details

Model Type: Image classification feature backbone
Parameters (M): 50.2
GMACs: 25.6
Activations (M): 63.4
Image Size: 384 x 384
Original Paper: A ConvNet for the 2020s
GitHub Repository: ConvNeXt

Model Usage

1. Image Classification

To classify an image, you can use the following code snippet:

from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("convnext_small.fb_in22k_ft_in1k_384", pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

2. Feature Map Extraction

If you’d like to extract the feature maps from the model, you can do so using the following code:

from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("convnext_small.fb_in22k_ft_in1k_384", pretrained=True, features_only=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
for o in output:
    print(o.shape)

3. Image Embeddings

To generate image embeddings, you can use the following code:

from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("convnext_small.fb_in22k_ft_in1k_384", pretrained=True, num_classes=0)  # remove classifier
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor
output = model.forward_features(transforms(img).unsqueeze(0))  # output is unpooled
output = model.forward_head(output, pre_logits=True)  # output is a (1, num_features) shaped tensor

Model Comparison

Explore the dataset and runtime metrics of this model in timm model results.

Understanding the Code: An Analogy

Think of the ConvNeXt model like a bakery that specializes in creating delicious cakes. The bakery uses different ingredients (parameters) in its recipes (models) to create mouth-watering cakes (image classifications). When a customer (you) places an order (inputs an image), the bakery processes the order using their trained chefs (the advanced neural networks) which have previously mastered various recipes from a cook book (the pretrained datasets). Just as the bakery has different sections for mixing, baking, and decorating cakes (stages of image processing), the model’s various layers process the input image to extract meaningful information.

Troubleshooting

If you encounter any issues, consider the following tips:

Ensure that you’re using the correct model name and that it matches the one from the repository.
Check your internet connection if you’re having trouble opening images from URLs.
Make sure the required libraries (like PIL and timm) are installed and up to date in your Python environment.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox