Welcome to the exciting world of image classification with the Next-ViT model! In this guide, we’ll walk you through the process of utilizing the Next-ViT image classification model, including how to perform image classification, extract feature maps, and obtain image embeddings. Whether you’re a beginner or an experienced developer, we’ve got you covered.
Understanding the Next-ViT Model
The Next-ViT model is like a skilled artist trained on a vast canvas—specifically, a combination of an unknown dataset consisting of 6 million samples and the well-known ImageNet-1k dataset. Think of it as a painter who has practiced on various types of art styles before embarking on their own masterpiece. Some key details about the model include:
- Model Type: Image classification feature backbone
- Parameters: 31.8 million
- GMACs: 5.8
- Activations: 17.6 million
- Image Size: 224 x 224 pixels
Getting Started with Next-ViT
1. Image Classification
Let’s start by classifying an image. Here’s how you can implement this:
python
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True)
model = model.eval()
# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # Unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
This code snippet opens an image from a URL, loads the Next-ViT model, and then processes the image to predict the top 5 classes based on model output.
2. Feature Map Extraction
Next, you might want to analyze the different layers of the model and extract feature maps:
python
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True, features_only=True)
model = model.eval()
# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # Unsqueeze single image into batch of 1
for o in output:
print(o.shape) # Print shape of each feature map in output
This will give you an insight into what the model sees at different layers, helping you understand how features are extracted.
3. Image Embeddings
Lastly, if you want to obtain image embeddings, use the following code:
python
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model("nextvit_small.bd_ssld_6m_in1k", pretrained=True, num_classes=0) # Remove classifier
model = model.eval()
# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # Output is (batch_size, num_features) shaped tensor
output = model.forward_features(transforms(img).unsqueeze(0)) # Output is unpooled
output = model.forward_head(output, pre_logits=True) # Output is (1, num_features) shaped tensor
Using this code, the model will provide you with a tensor representing the image embeddings without the classification layer.
Troubleshooting Tips
While using the Next-ViT model, you may encounter a few common issues. Here are some troubleshooting ideas:
- Issue: Model not loading.
- Solution: Ensure you have the required packages installed and your internet connection is stable.
- Issue: Incorrect image format.
- Solution: Make sure the image is accessible and in a valid format (JPEG, PNG).
- Issue: Unexpected output shapes.
- Solution: Double-check the transformations applied and ensure compatibility with model input.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
Congratulations! You’ve now learned how to use the Next-ViT model for image classification, feature extraction, and image embeddings. By utilizing this model, you can harness the power of advanced AI technologies in your projects.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

