How to Use Next-ViT for Image Classification

Feb 14, 2024 | Educational

Welcome to the exciting world of image classification with the Next-ViT model! In this guide, we’ll walk you through the process of utilizing the Next-ViT image classification model, including how to perform image classification, extract feature maps, and obtain image embeddings. Whether you’re a beginner or an experienced developer, we’ve got you covered.

Understanding the Next-ViT Model

The Next-ViT model is like a skilled artist trained on a vast canvas—specifically, a combination of an unknown dataset consisting of 6 million samples and the well-known ImageNet-1k dataset. Think of it as a painter who has practiced on various types of art styles before embarking on their own masterpiece. Some key details about the model include:

Model Type: Image classification feature backbone
Parameters: 31.8 million
GMACs: 5.8
Activations: 17.6 million
Image Size: 224 x 224 pixels

Getting Started with Next-ViT

1. Image Classification

Let’s start by classifying an image. Here’s how you can implement this:

python
from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True)
model = model.eval()

# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # Unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

This code snippet opens an image from a URL, loads the Next-ViT model, and then processes the image to predict the top 5 classes based on model output.

2. Feature Map Extraction

Next, you might want to analyze the different layers of the model and extract feature maps:

python
from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True, features_only=True)
model = model.eval()

# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # Unsqueeze single image into batch of 1
for o in output:
    print(o.shape)  # Print shape of each feature map in output

This will give you an insight into what the model sees at different layers, helping you understand how features are extracted.

3. Image Embeddings

Lastly, if you want to obtain image embeddings, use the following code:

python
from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True, num_classes=0)  # Remove classifier
model = model.eval()

# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # Output is (batch_size, num_features) shaped tensor
output = model.forward_features(transforms(img).unsqueeze(0))  # Output is unpooled
output = model.forward_head(output, pre_logits=True)  # Output is (1, num_features) shaped tensor

Using this code, the model will provide you with a tensor representing the image embeddings without the classification layer.

Troubleshooting Tips

While using the Next-ViT model, you may encounter a few common issues. Here are some troubleshooting ideas:

Issue: Model not loading.
Solution: Ensure you have the required packages installed and your internet connection is stable.
Issue: Incorrect image format.
Solution: Make sure the image is accessible and in a valid format (JPEG, PNG).
Issue: Unexpected output shapes.
Solution: Double-check the transformations applied and ensure compatibility with model input.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Congratulations! You’ve now learned how to use the Next-ViT model for image classification, feature extraction, and image embeddings. By utilizing this model, you can harness the power of advanced AI technologies in your projects.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox