Harnessing the Power of YOLOS for Object Detection

Apr 12, 2024 | Educational

Object detection has become an essential function in the world of computer vision. Today, we’ll explore the YOLOS model—an efficient and effective solution for this task. This guide will help you understand how to use the YOLOS (tiny-sized) model fine-tuned on the COCO 2017 object detection dataset, making it easier to integrate this powerful tool into your projects.

What is YOLOS?

YOLOS, or “You Only Look at One Sequence,” is a Vision Transformer (ViT) model designed for object detection tasks. It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. This document highlights the model’s capabilities, showing that despite its simplicity, a base-sized YOLOS model can achieve significant performance, scoring a 42 average precision (AP) on the COCO 2017 validation set, comparable to more complex models like Faster R-CNN.

How Does YOLOS Work?

You can think of YOLOS like a skilled artist trying to capture a scene filled with objects. The artist does not attempt to paint everything at once. Instead, they focus on specific areas, connecting the dots, and ensuring that each object is recognized based on their composition and position. YOLOS works similarly by utilizing a bipartite matching loss to compare predicted classes and bounding boxes of objects against ground truth annotations. Let’s break it down:

  • The model uses 100 object queries.
  • It applies the Hungarian algorithm to optimally match those queries with ground truth annotations.
  • It then utilizes a combination of cross-entropy and IoU loss to refine its predictions for better accuracy.

This clever approach makes YOLOS not only effective but also efficient compared to other frameworks.

How to Use YOLOS

Integrating YOLOS into your project is a breeze. Below is a step-by-step guide on how to use it effectively:

python
from transformers import YolosImageProcessor, YolosForObjectDetection
from PIL import Image
import torch
import requests

url = "http://images.cocodataset.org/val2017/000000397689.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = YolosForObjectDetection.from_pretrained("hustvl/yolos-tiny")
image_processor = YolosImageProcessor.from_pretrained("hustvl/yolos-tiny")

inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# Model predicts bounding boxes and corresponding COCO classes
logits = outputs.logits
bboxes = outputs.pred_boxes

# Print results
target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, threshold=0.9, target_sizes=target_sizes)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(f"Detected {model.config.id2label[label.item()]} with confidence {round(score.item(), 3)} at location {box}")

Understanding the Code

Imagine you’re preparing for a grand art exhibition (the object detection task). You carefully open the door (the image), bring in your materials (load the YOLOS model and processor), and start sketching (predictions). You analyze where to place your strokes (bounding boxes) and what colors to use (object classes) to ensure your artwork is recognized and appreciated. Each step corresponds to parts of the code you see above.

Troubleshooting

While using YOLOS, there may be a few hiccups along the way. Here are some troubleshooting tips to help you navigate through common issues:

  • Model Not Found: Ensure that you have correctly specified the model name (e.g., “hustvl/yolos-tiny”).
  • Image Not Loading: Check the URL or file path of the image to ensure it’s correct and accessible.
  • Incompatible Versions: Ensure that your Python libraries (transformers, torch, PIL) are updated to their latest versions.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

YOLOS offers a contemporary approach to object detection, combining efficiency and performance. With the guidance provided here, you can leverage this powerful tool in your own projects and experience the marvels of modern AI applications. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox