How to Use the Segment Anything Model (SAM) – ViT Base (ViT-B)

Jan 14, 2024 | Educational

Welcome to this guide on utilizing the Segment Anything Model (SAM), which makes image segmentation a breeze! If you’re keen on diving into the world of AI-driven computer vision, you’re in for a treat. The SAM model is a cutting-edge tool that generates high-quality object masks based on input prompts. Let’s take a closer look at how it functions and how you can get started.

TL;DR
Model Details
Usage
Citation

TL;DR

The Segment Anything Model (SAM) transforms input such as points or boxes into object masks and performs superiorly across multiple segmentation tasks. Trained on a dataset of 11 million images and 1.1 billion masks, its performance in zero-shot settings is noteworthy—often rivaling traditional supervised models. For further exploration, visit the original repository.

Here are some examples of tasks that can be accomplished with SAM:

Sample Mask

Sample Dog Mask

Sample Car Segmentation

Model Details

The SAM model consists of four key modules:

VisionEncoder: A Vision Transformer (VIT) based image encoder that computes image embeddings using attention on image patches with relative positional embedding.
PromptEncoder: Generates embeddings for input prompts like points and bounding boxes.
MaskDecoder: A transformer that performs cross attention, processing image and point embeddings to generate output masks.
Neck: Predicts output masks from the contextualized masks generated by the MaskDecoder.

Usage

Let’s explore how to implement SAM for both prompted and automatic mask generation. Think of it as writing a script for a play: you provide the model with prompts, and it stages the performance by creating masks where necessary. Each prompt represents a character or an event that guides the storyline— in this case, the segmentation process.

Prompted Mask Generation

Follow these steps to generate masks with specific input points:

from PIL import Image
import requests
from transformers import SamModel, SamProcessor

model = SamModel.from_pretrained("facebook/sam-vit-base")
processor = SamProcessor.from_pretrained("facebook/sam-vit-base")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")

input_points = [[[450, 600]]]  # 2D localization of a window
inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores

In this snippet, we utilize input_points to specify the location of objects of interest.

Automatic Mask Generation

For generating masks without specific prompts (zero-shot), run the following:

from transformers import pipeline

generator = pipeline("mask-generation", device=0, points_per_batch=256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch=256)

Displaying the output masks can be achieved with the following code:

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)
    plt.imshow(np.array(raw_image))

ax = plt.gca()
for mask in outputs["masks"]:
    show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()

Common Troubleshooting Ideas

If you encounter issues while using the SAM model, consider the following troubleshooting steps:

Ensure that you have the appropriate libraries installed and updated.
Check your device settings (CUDA availability) if you’re attempting to run on GPU.
Review your input format—incorrect input data may lead to unexpected results.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Citation

If you wish to cite the SAM model in your work, use the following BibTeX entry:

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox