How to Use RobustSAM for Image Segmentation on Degraded Images

Aug 16, 2024 | Educational

Welcome to the world of RobustSAM, a cutting-edge model designed to enhance image segmentation on low-quality images. In this article, we will walk you through how to use RobustSAM effectively, while also providing some troubleshooting tips to help you along the way.

Introduction to RobustSAM

Robust Segment Anything Model (RobustSAM) builds upon the transformative Segment Anything Model (SAM), enhancing its performance on degraded images. This is achieved with minimal changes to the parameters, making it feasible for typical research settings.

Model Breakdown

To better understand RobustSAM, think of it as a tailored suit for image segmentation. Just as a suit has different segments—like sleeves, lapels, and pockets—RobustSAM is composed of several modules:

VisionEncoder: The image’s fabric is woven together using a VIT-based encoder that processes image patches.
PromptEncoder: This acts like a tailor’s assistant, taking input locations and craftily crafting embeddings for points and bounding boxes.
MaskDecoder: Think of this as the fashion designer, deciding how to neatly cut and assemble the final mask based on the embeddings.

Finally, we have the Neck, which forms the output masks based on the contextualized masks produced by the MaskDecoder.

Getting Started with RobustSAM

Here’s how to use RobustSAM for prompt-based and automatic mask generation:

Prompted-Mask-Generation

To start generating masks using specific points, follow this code snippet:

from PIL import Image
import requests
from transformers import AutoProcessor, AutoModelForMaskGeneration

# Load the RobustSAM model and processor
processor = AutoProcessor.from_pretrained('jadechoghari/robustsam-vit-huge')
model = AutoModelForMaskGeneration.from_pretrained('jadechoghari/robustsam-vit-huge')

# Load an image from a URL
img_url = 'https://huggingface.co/ybelkadas/segment-anything/resolve/main/assets/car.png'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# Define input points (2D localization of an object in the image)
input_points = [[[450, 600]]]  # Example point

# Process the image and input points
inputs = processor(raw_image, input_points=input_points, return_tensors='pt').to('cuda')

# Generate masks using the model
with torch.no_grad():
    outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs['original_sizes'].cpu(), inputs['reshaped_input_sizes'].cpu())
scores = outputs.iou_scores

In this analogy, think of loading an image like picking a canvas and using input points as specific spots where you want to paint. After the processing stage, the model generates masks like an artist revealing the outlines based on your brush locations!

Automatic-Mask-Generation

If you wish to generate masks automatically, you can follow this snippet:

from transformers import pipeline

# Initialize the pipeline for mask generation
generator = pipeline('mask-generation', model='jadechoghari/robustsam-vit-huge', device=0, points_per_batch=256)
image_url = 'https://huggingface.co/ybelkadas/segment-anything/resolve/main/assets/car.png'
outputs = generator(image_url, points_per_batch=256)

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

# Simple function to display the mask
def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30/255, 144/255, 255/255, 0.6])  # Default color
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)

# Display the original image
plt.imshow(np.array(raw_image))
ax = plt.gca()

# Loop through the masks and display each one
for mask in outputs['masks']:
    show_mask(mask, ax=ax, random_color=True)
plt.axis('off')
# Show the image with the masks
plt.show()

Here, the automatic mask generation acts like a pre-set filter on a social media app, generating decorative masks that overlay the original photo without any manual input!

Troubleshooting

If you encounter any issues while using RobustSAM, consider the following troubleshooting tips:

Ensure your model and processor are correctly imported and initialized.
Check your image URL to confirm it’s valid and accessible.
If you’re running out of CUDA memory, try reducing the points_per_batch in the automatic generation section.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

RobustSAM unlocks new possibilities for image segmentation in challenging conditions. By following the steps provided, you can harness its power for your own projects.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox