How to Generate Salient Object-Aware Backgrounds Using Text-Guided Diffusion Models

May 8, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_23_235

In the world of computer vision, generating backgrounds that seamlessly integrate with objects can be tricky. Our recent work addresses this challenge using diffusion models for background generation while preserving the integrity of salient objects. Through this blog, we’ll guide you step-by-step on how to implement this concept effectively.

Understanding the Foundation

The project revolves around the concept of “object expansion,” where models such as Stable Inpainting may inadvertently distort salient objects when generating backgrounds. Think of it like trying to paint a masterpiece with a noisy crowd in the background; you want to ensure the subject remains clear and undisturbed by the chaotic scenery around it.

Getting Started

To kick things off, you’ll need to set up your environment. Start by importing the necessary libraries and loading the pretrained model. Here’s how you can do it:

from diffusers import DiffusionPipeline
model_id = "yahoo-inc/photo-background-generation"
pipeline = DiffusionPipeline.from_pretrained(model_id, custom_pipeline=model_id)
pipeline = pipeline.to('cuda')

Loading and Processing Your Image

Next, you’ll want to upload an image to process. Here’s a breakdown of how to resize your image with proper padding:

from PIL import Image, ImageOps
import requests
from io import BytesIO
from transparent_background import Remover

def resize_with_padding(img, expected_size):
    img.thumbnail((expected_size[0], expected_size[1]))
    delta_width = expected_size[0] - img.size[0]
    delta_height = expected_size[1] - img.size[1]
    pad_width = delta_width // 2
    pad_height = delta_height // 2
    padding = (pad_width, pad_height, delta_width - pad_width, delta_height - pad_height)
    return ImageOps.expand(img, padding)

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teresópolis.jpg/2560px-Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teresópolis.jpg"
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))
img = resize_with_padding(img, (512, 512))

Generating the Background

Now comes the fun part! You will need to run foreground detection and generate a new background. Here’s how you can achieve this:

remover = Remover()  # default settings
fg_mask = remover.process(img, type="map")  # Get the foreground mask

seed = 13
mask = ImageOps.invert(fg_mask)
img = resize_with_padding(img, (512, 512))
generator = torch.Generator(device='cuda').manual_seed(seed)
prompt = "A dark swan in a bedroom"
cond_scale = 1.0

with torch.autocast('cuda'):
    controlnet_image = pipeline(
        prompt=prompt, 
        image=img, 
        mask_image=mask, 
        control_image=mask, 
        num_images_per_prompt=1, 
        generator=generator, 
        num_inference_steps=20, 
        guess_mode=False, 
        controlnet_conditioning_scale=cond_scale
    ).images[0]

Troubleshooting Tips

If you encounter issues with generating the background effectively, ensure that all package dependencies are correctly installed and compatible.
Check the input image size; resizing might be necessary for optimal results.
For any persistent problems, consider revisiting the specified model and parameters in your setup to ensure they match the intended usage.
For further support, don’t hesitate to reach out for insights at fxis.ai.

Conclusion

Through the steps outlined above, you now possess the knowledge to effectively generate salient object-aware backgrounds using text-guided diffusion models. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox