How to Use WD SwinV2 Tagger v3 with 🤗 Transformers

Mar 14, 2024 | Educational

In the realm of image classification, utilizing advanced models can significantly enhance your project’s output. The WD SwinV2 Tagger v3 by SmilingWolf, now compatible with the 🤗 Transformers library, allows you to classify images intelligently. In this guide, we will walk you through the setup and usage of this powerful tool.

Step 1: Installation

To get started, you need to install the Transformers library. Open your terminal and run the following command:

pip install transformers

Step 2: Setting Up the Pipeline

Once you have the library installed, it’s time to set up your image classification pipeline. Below is a simple analogy to help understand what a pipeline does:

Imagine you are at a restaurant. The pipeline is like the entire dining experience: you place an order (inputting your image), the kitchen prepares the meal (processing the image using the model), and finally, the waiter brings it to your table (displaying the classification results).

Here’s how you can set it up in your code:

from transformers import pipeline

pipe = pipeline(
    "image-classification",
    model="p1atdev/wd-swinv2-tagger-v3-hf",
    trust_remote_code=True,
)

print(pipe("sample.webp", top_k=15))

This code snippet initializes the image classification pipeline using the WD SwinV2 Tagger model. You can then classify your images and obtain the top results.

Step 3: Using the AutoModel

If you need more control over the model’s output, you can use the AutoModel feature. This method allows for more detailed interactions with the model, similar to a chef showing you how each ingredient in your dish contributes to the overall flavor.

Below is how you can implement this:

from PIL import Image
import numpy as np
import torch
from transformers import (AutoImageProcessor, AutoModelForImageClassification)

MODEL_NAME = "p1atdev/wd-swinv2-tagger-v3-hf"
model = AutoModelForImageClassification.from_pretrained(MODEL_NAME)
processor = AutoImageProcessor.from_pretrained(MODEL_NAME, trust_remote_code=True)

image = Image.open("sample.webp")
inputs = processor.preprocess(image, return_tensors="pt")

with torch.no_grad():  
    outputs = model(**inputs.to(model.device, model.dtype))

logits = torch.sigmoid(outputs.logits[0])  # take the first logits
results = {model.config.id2label[i]: logit.float() for i, logit in enumerate(logits)}
results = {k: v for k, v in sorted(results.items(), key=lambda item: item[1], reverse=True) if v > 0.35}  # 35% threshold

print(results)  # rating tags and character tags are also included

This snippet exploits the model to analyze the image and return a dictionary of tags and scores that exceed a defined threshold.

Step 4: Optimization with 🤗 Optimum

If you’re looking to enhance performance, you can utilize the 🤗 Optimum integration, which makes the model faster and lighter. It’s like upgrading your kitchen appliances for better efficiency in the cooking process. Here’s how to install and implement it:

pip install optimum[onnxruntime]
from transformers import pipeline
from optimum.pipelines import pipeline

pipe = pipeline(
    "image-classification",
    model="p1atdev/wd-swinv2-tagger-v3-hf",
    trust_remote_code=True,
)

print(pipe("sample.webp", top_k=15))

With this setup, you can expect improved speed, though the accuracy may adjust slightly.

Understanding the Labels

The tags returned from the model are categorized as follows:

Rating tags: Tags that help in understanding the nature of the image (e.g., rating:general, rating:sensitive).
Character tags: Specific labels that denote characters in the image (e.g., character:frieren, character:hatsune miku).

Troubleshooting

If you encounter issues during installation or execution, here are some troubleshooting tips:

Ensure that you have the latest version of Python and the Transformers library installed.
Double-check that all model names and paths are correctly specified.
If you experience runtime errors related to memory, consider using a machine with more RAM or optimizing your input images.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox