How to Use SAM2: Segment Anything in Images and Videos

Aug 11, 2024 | Educational

If you’re looking to harness the power of visual segmentation in images and videos, you’ve come to the right place! SAM 2, developed by FAIR, is a groundbreaking model that offers a robust solution for promptable visual segmentation. This article will guide you through the installation and usage of the SAM2 library seamlessly.

Getting Started with SAM2

Before diving into code, ensure you have the necessary libraries installed. You’ll need Python and PyTorch set up on your machine. Once everything is in place, you can clone the SAM2 repository using the command:

git clone https://github.com/facebookresearch/segment-anything-2/

Now let’s explore how to utilize SAM2 for both image and video segmentation!

Image Prediction with SAM2

To predict segments in an image, you can follow these steps:

Import the necessary modules.
Load the pre-trained model.
Set your image and input prompts.

The following code provides a clear example:

import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-small")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    predictor.set_image()
    masks, _, _ = predictor.predict()

Video Prediction with SAM2

Segmenting in videos is just as straightforward. Here’s the breakdown:

Import relevant modules.
Load the pre-trained video predictor.
Initialize the video state, add new prompts, and propagate segments across frames.

Here’s how this looks in practice:

import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor

predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-small")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state()
  
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, )
    
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        ...

Understanding the Code Structure: An Analogy

Think of the segmentation process like a fancy photography studio. Your image or video is the subject, and the SAM2 model is the highly skilled photographer. When you set your image or video, it’s like telling the photographer what to focus on. The input prompts are the instructions, guiding the photographer on which elements to highlight. During the prediction phase, the photographer snaps the picture (mask generation), expertly isolating the subjects based on your specifications. Just like how good lighting and focus lead to a stunning photo, the quality of your input enhances the segmentation results in SAM2!

Troubleshooting Tips

If you encounter issues while using SAM2, consider the following troubleshooting strategies:

Ensure your PyTorch version is compatible with the hardware you’re using, especially if you’re utilizing GPU acceleration.
Check that you have the correct format for <your_image> and <your_video>.
Consult the demo notebooks for examples and additional help.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With SAM2, the ability to segment anything in images and videos is at your fingertips. Given its capability and flexibility, there’s no limit to how you can utilize it for your AI projects.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox