Getting Started with ViT Image Feature Model Using DINOv2

Feb 13, 2024 | Educational

If you’re excited about using the latest advancements in image classification, you’re in for a treat! Today, we will explore the vit_base_patch14_reg4_dinov2 model, which leverages Vision Transformers (ViT) for image feature extraction. This powerful model comes pretrained on the LVD-142M dataset with the self-supervised DINOv2 method. Let’s dive in!

Model Details

This section highlights the key attributes of the ViT model:

  • Model Type: Image classification feature backbone
  • Parameters: 86.6 Million
  • GMACs: 117.5
  • Activations: 115.0 Million
  • Image Size: 518 x 518
  • Pretrain Dataset: LVD-142M

For further reading, check the following papers:

Model Usage

The beauty of this model lies in its versatility for two primary tasks: Image Classification and Image Embeddings. Let’s walk through how to use it for both.

1. Image Classification

Below is the Python code to apply the model for image classification:

python
from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model('vit_base_patch14_reg4_dinov2.lvd142m', pretrained=True)
model = model.eval()

# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))  # Unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

2. Image Embeddings

To extract image embeddings, use the following code:

python
from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"))
model = timm.create_model('vit_base_patch14_reg4_dinov2.lvd142m', pretrained=True, num_classes=0)  # Remove classifier nn.Linear
model = model.eval()

# Get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))  # Output is (batch_size, num_features) shaped tensor
# or equivalently (without needing to set num_classes=0)
output = model.forward_features(transforms(img).unsqueeze(0))  # Output is unpooled, a (1, 1374, 768) shaped tensor
output = model.forward_head(output, pre_logits=True)  # Output is a (1, num_features) shaped tensor

Understanding the Code

Let’s visualize the code with an analogy: imagine you’re a chef in a restaurant kitchen. You have a variety of ingredients (images) and a special cuisine technique (ViT model). Each recipe may require different arrangements and classifications of ingredients to create delicious meals (output labels or embeddings).

  • First, you gather your ingredients from an online market (loading an image from a URL).
  • Next, you preheat your kitchen (initialize the model) and prepare specific utensils (data transformations) required for each dish.
  • Finally, you start cooking (run the model) and based on the flavors (output), you determine the top 5 dishes your customers will love (top 5 classifications).

Troubleshooting Tips

If you’re facing issues while using the model, here are some common problems and their solutions:

  • Model Loading Errors: Ensure that you have installed the latest version of the timm library. If the error occurs, reinstall it using pip install timm.
  • Image Loading Issues: Verify if the image URL is accessible. You can check this by opening the link in a web browser.
  • Shape Mismatch Errors: Double-check the image size specified in the model configuration. Make sure that it matches the input size you’ve defined.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Model Comparison

Explore the dataset and runtime metrics of this model in the timm model results.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox