How to Use the Next-ViT Model for Image Classification

Feb 12, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_21_175

The Next-ViT image classification model is a powerful tool designed for efficient processing and deployment in real-world scenarios. This guide will help you navigate through the model details, usage, and even troubleshoot any issues you might face. Let’s get started!

Model Details

Before diving into the practical usage, here’s what you need to know about the Next-ViT model:

Model Type: Image classification feature backbone
Model Stats:
- Parameters (M): 31.8
- GMACs: 5.8
- Activations (M): 17.6
- Image Size: 224 x 224
Pretrain Dataset: Unknown-6M
Dataset: ImageNet-1k
Papers:
- Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
- GitHub Repository

Model Usage

Let’s look at how to utilize the Next-ViT model for image classification, feature extraction, and generating embeddings.

Image Classification

To classify an image, you can use the following Python code:

python
from urllib.request import urlopen
from PIL import Image
import timm
import torch

# Open an image from a URL
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))

# Load the Next-ViT model
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True)
model = model.eval()

# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

# Classify the image
output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Feature Map Extraction

The model can also be used to extract feature maps, which represent different aspects of the image:

python
from urllib.request import urlopen
from PIL import Image
import timm
import torch

# Open an image from a URL
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))

# Load the Next-ViT model for feature extraction
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True, features_only=True)
model = model.eval()

# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

# Extract features
output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

# Print shape of each feature map in output
for o in output:
    print(o.shape)  # e.g.: torch.Size([1, 96, 56, 56])

Image Embeddings

Generating embeddings without a classifier can be performed as follows:

python
from urllib.request import urlopen
from PIL import Image
import timm
import torch

# Open an image from a URL
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))

# Load the model with num_classes=0 to remove the last layer
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True, num_classes=0)
model = model.eval()

# Get model specific transforms
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

# Get outputs
output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

Troubleshooting Tips

Here are some ideas for common issues you may encounter:

Image Not Loading: Ensure that the URL is accessible and correct. Double-check for typos.
Model Not Found: Make sure you have installed the timm library properly and it’s updated to the latest version.
Output Shape Issues: Verify that the input image has the correct size (224 x 224) and is appropriately transformed before being passed to the model.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Using the Next-ViT model can greatly enhance your image classification tasks. Think of it as a chef who meticulously prepares a gourmet dish using a refined recipe to extract the most flavor from the ingredients (images in this case). Each step—be it classification, feature extraction, or embedding generation—adds depth to the overall outcome.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox