How to Use a Vision Transformer for Colorectal Image Classification

Apr 28, 2023 | Educational

In this blog, we’ll explore the amazing capabilities of a Vision Transformer fine-tuned on the kvasir_v2 dataset tailored for colonoscopy classification. This cutting-edge model boasts an impressive accuracy of 0.93, making it a reliable tool for analyzing medical images. Let’s dive into how to get started!

Live Demo

Before we jump into the coding details, you can test the model yourself! Drag the following images to the widget to see the model in action:

  • Image 1
  • Image 2
  • Image 3
  • Image 4

Training and Metrics

The model has shown exceptional results during its evaluation phase, producing the following metrics:

                                 precision    recall   f1-score   support
dyed-lifted-polyps        0.95      0.93      0.94        60
dyed-resection-margins    0.97      0.95      0.96        64
esophagitis               0.93      0.79      0.85        67
normal-cecum              1.00      0.98      0.99        54
normal-pylorus            0.95      1.00      0.97        57
normal-z-line             0.82      0.93      0.87        67
polyps                    0.92      0.92      0.92        52
ulcerative-colitis        0.93      0.95      0.94        59
accuracy                               0.93       480
macro avg                 0.93      0.93      0.93       480
weighted avg              0.93      0.93      0.93       480

How to Use the Model

Here’s a step-by-step guide on how to implement the model in your own projects!

  1. First, import the necessary libraries:
  2. from transformers import ViTFeatureExtractor, ViTForImageClassification
    from hugsvision.inference.VisionClassifierInference import VisionClassifierInference
  3. Define the model path and initialize the classifier:
  4. path = "mrm8488/vit-base-patch16-224_finetuned-kvasirv2-colonoscopy"
    classifier = VisionClassifierInference(
        feature_extractor = ViTFeatureExtractor.from_pretrained(path),
        model = ViTForImageClassification.from_pretrained(path),
    )
  5. Finally, provide the image path and classify the image:
  6. img = "Your image path"
    label = classifier.predict(img_path=img)
    print("Predicted class:", label)

Code Analogy

Imagine your code as a chef preparing a gourmet meal. The imports act like the ingredients that you gather at the market. You have fresh vegetables (ViTFeatureExtractor) and a perfectly marinated meat (ViTForImageClassification). The preparation process includes washing, chopping, and sautéing those ingredients, just like you initialize your classifier with the necessary model. Finally, the image represents the meal, and predicting the class is akin to tasting your dish to confirm it’s delicious. Just as a chef refines recipes, you can tweak the model to improve accuracy further!

Troubleshooting

If you encounter any issues while using this model, here are some common troubleshooting steps:

  • Ensure that you have the correct version of the Transformers library installed.
  • Check if the image path is correct and that the image format is supported.
  • Review the console for any error messages, which can provide clues on what went wrong.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox