How to Use the ConvNeXt Femto Model for Image Classification

Feb 11, 2024 | Educational

The ConvNeXt Femto model is a powerful image classification tool trained on the vast ImageNet-1k dataset. This guide will walk you through how to utilize this model for image classification, feature map extraction, and obtaining image embeddings, all while keeping it user-friendly.

Understanding the Model

The ConvNeXt Femto model operates like a seasoned chef in a busy kitchen, layering its knowledge to take simple ingredients (raw images) and turning them into exquisite dishes (classified labels). It consists of 5.2 million parameters that work together to process images of different sizes. Here’s a simplified view of the model’s specifications:

  • Model Type: Image classification feature backbone
  • Params: 5.2M
  • GMACs: 0.8
  • Image Size: Train = 224 x 224, Test = 288 x 288

Model Usage

Image Classification

To classify images, follow these steps:

python
from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("convnext_femto.d1_in1k", pretrained=True)
model = model.eval()

data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Feature Map Extraction

If you’re interested in understanding how the model processes images, you can extract feature maps:

python
from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("convnext_femto.d1_in1k", pretrained=True, features_only=True)
model = model.eval()

data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
for o in output:
    print(o.shape)

Obtaining Image Embeddings

The final step allows you to generate embeddings for images:

python
from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("convnext_femto.d1_in1k", pretrained=True, num_classes=0)  # remove classifier nn.Linear
model = model.eval()

data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor
output = model.forward_features(transforms(img).unsqueeze(0))  # output is unpooled, a (1, 384, 7, 7) shaped tensor
output = model.forward_head(output, pre_logits=True)  # output is a (1, num_features) shaped tensor

Troubleshooting Common Issues

If you encounter issues while using the ConvNeXt model, here are a few troubleshooting tips:

  • Ensure you have installed all required packages, such as timm, PIL, and torch.
  • Check for any typos in the model name while loading. It should be convnext_femto.d1_in1k.
  • Make sure that your image URL is accessible and valid.
  • If you’re getting dimensions or shape mismatch errors, verify that the input image size is appropriate (either 224×224 for training or 288×288 for testing).

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With the powerful capabilities of the ConvNeXt model, you can easily classify images, extract feature maps, and generate embeddings. It’s like having a tech-savvy assistant in your image processing kitchen, assisting you every step of the way.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox