How to Use SAM 2 for Image and Video Segmentation

Oct 28, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesfacebook_sam2.1-hiera-large

Welcome to your ultimate guide for utilizing SAM 2: Segment Anything in Images and Videos. Developed by the talented minds at FAIR, this foundation model tackles the challenge of promptable visual segmentation. Whether you want to enhance images or interpret videos, this guide will simplify the process and set you on the right path.

Getting Started with SAM 2

Before diving into the coding, make sure you have the proper setup. SAM 2 is available through its official code repository, and for detailed documentation, you can refer to the SAM 2 paper as well as the GitHub repo.

Using SAM 2 for Image Prediction

Let’s start with image predictions. If you can visualize your input as a blank canvas awaiting strokes of brilliance, consider the model as your artist that meticulously applies masks based on your prompts. Below is the code you will employ:

import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained('facebook/sam2-hiera-large')
with torch.inference_mode(), torch.autocast(cuda, dtype=torch.bfloat16):
    predictor.set_image(your_image)
    masks, _, _ = predictor.predict(input_prompts)

Using SAM 2 for Video Prediction

Now, let’s venture into the domain of videos. Think of your video as a flowing river where your prompts act as guiding buoys steering the model to highlight what matters throughout the stream. Use the following code to engage with video predictions:

import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor

predictor = SAM2VideoPredictor.from_pretrained('facebook/sam2-hiera-large')
with torch.inference_mode(), torch.autocast(cuda, dtype=torch.bfloat16):
    state = predictor.init_state(your_video)
    # add new prompts and instantly get the output on the same frame
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, your_prompts)
    # propagate the prompts to get masklets throughout the video
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        ...

Troubleshooting Common Issues

As you navigate through SAM 2, you may encounter challenges. Here are some common troubleshooting tips:

Model Not Found: Ensure that you have correctly specified the model name ‘facebook/sam2-hiera-large’ in your code.
Incorrect Input Format: Remember that ‘your_image’ and ‘your_video’ should properly be loaded into the right format before processing.
CUDA Issues: If CUDA errors are displayed, verify your CUDA installation and check compatibility with PyTorch.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Closing Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox