How to Utilize the Segment Anything Model (SAM) for Image Segmentation

Jan 14, 2024 | Educational

The Segment Anything Model (SAM) is an exceptional tool in the realm of image segmentation that enables the generation of object masks efficiently. This guide is aimed at making your experience with SAM as smooth as possible. Whether you’re a seasoned developer or just getting started, we have you covered!

Table of Contents

TL;DR

The Segment Anything Model (SAM) is capable of producing high-quality object masks from varied input prompts. Trained on a vast dataset of 11 million images, it showcases notable zero-shot performance. For further reading, check out the original repository.

![Model architecture](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/sam-architecture.png)
Detailed architecture of Segment Anything Model (SAM).

Model Details

The SAM model comprises three essential components:

  • VisionEncoder: This VIT based image encoder computes image embeddings by employing attention on image patches using relative positional embedding.
  • PromptEncoder: This module generates embeddings for points and bounding boxes.
  • MaskDecoder: A two-way transformer, performing attention between image embeddings and point embeddings.
  • Neck: It predicts output masks based on the contextualized masks produced by the MaskDecoder.

Usage

Moving on to using the SAM model, you can opt for either prompted mask generation or automatic mask generation depending on your needs.

Prompted Mask Generation

This method requires you to provide specific points or bounding boxes for segmentation. Here’s how you can do it:


from PIL import Image
import requests
from transformers import SamModel, SamProcessor

model = SamModel.from_pretrained("facebook/sam-vit-huge")
processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]] # 2D localization of a window

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores

In this code, the model loads an image and processes it by providing a 2D location of the object of interest.

Automatic Mask Generation

This functionality allows the model to generate segmentation masks without the need for specific prompts. Here’s a simple example:


from transformers import pipeline

generator = pipeline("mask-generation", device=0, points_per_batch=256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch=256)

To visualize the masks, you can use the following code snippet…


import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

def show_mask(mask, ax, random_color=False):
    color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0) if random_color else np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)

    plt.imshow(np.array(raw_image))

ax = plt.gca()
for mask in outputs["masks"]:
    show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()

Using this code, you’ll render the segmentation masks directly on your image!

Troubleshooting

Should you encounter any issues while using the SAM model, consider the following tips:

  • Common Error: Model not found: Ensure that all the necessary libraries are installed and you’re using the correct model name.
  • CUDA issues: If you’re facing GPU-related errors, verify that your device supports CUDA and the drivers are up to date.
  • Input Format: Double-check that your input formats (like image URLs or bounding boxes) align with the requirements mentioned in the documentation.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Citation

If you utilize this model for your research, please cite it using the following BibTeX entry:


@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}}

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox