The Segment Anything Model (SAM) is an exceptional tool in the realm of image segmentation that enables the generation of object masks efficiently. This guide is aimed at making your experience with SAM as smooth as possible. Whether you’re a seasoned developer or just getting started, we have you covered!
Table of Contents
TL;DR
The Segment Anything Model (SAM) is capable of producing high-quality object masks from varied input prompts. Trained on a vast dataset of 11 million images, it showcases notable zero-shot performance. For further reading, check out the original repository.

Detailed architecture of Segment Anything Model (SAM).
Model Details
The SAM model comprises three essential components:
- VisionEncoder: This VIT based image encoder computes image embeddings by employing attention on image patches using relative positional embedding.
- PromptEncoder: This module generates embeddings for points and bounding boxes.
- MaskDecoder: A two-way transformer, performing attention between image embeddings and point embeddings.
- Neck: It predicts output masks based on the contextualized masks produced by the MaskDecoder.
Usage
Moving on to using the SAM model, you can opt for either prompted mask generation or automatic mask generation depending on your needs.
Prompted Mask Generation
This method requires you to provide specific points or bounding boxes for segmentation. Here’s how you can do it:
from PIL import Image
import requests
from transformers import SamModel, SamProcessor
model = SamModel.from_pretrained("facebook/sam-vit-huge")
processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")
img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]] # 2D localization of a window
inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores
In this code, the model loads an image and processes it by providing a 2D location of the object of interest.
Automatic Mask Generation
This functionality allows the model to generate segmentation masks without the need for specific prompts. Here’s a simple example:
from transformers import pipeline
generator = pipeline("mask-generation", device=0, points_per_batch=256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch=256)
To visualize the masks, you can use the following code snippet…
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
def show_mask(mask, ax, random_color=False):
color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0) if random_color else np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
h, w = mask.shape[-2:]
mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
ax.imshow(mask_image)
plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs["masks"]:
show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()
Using this code, you’ll render the segmentation masks directly on your image!
Troubleshooting
Should you encounter any issues while using the SAM model, consider the following tips:
- Common Error: Model not found: Ensure that all the necessary libraries are installed and you’re using the correct model name.
- CUDA issues: If you’re facing GPU-related errors, verify that your device supports CUDA and the drivers are up to date.
- Input Format: Double-check that your input formats (like image URLs or bounding boxes) align with the requirements mentioned in the documentation.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Citation
If you utilize this model for your research, please cite it using the following BibTeX entry:
@article{kirillov2023segany,
title={Segment Anything},
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
journal={arXiv:2304.02643},
year={2023}}
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

