How to Use MGIE for Multimodal Image Editing

Feb 22, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_9_174

Welcome to the guide on utilizing the Multimodal Guiding Instruction-based Image Editing (MGIE) library. This powerful tool blends UNet and LLaVA model checkpoints to facilitate sophisticated image editing through multimodal large language models. In this blog, we will walk you through the necessary steps to harness the capabilities of MGIE, including processing images and troubleshooting common issues.

What is MGIE?

MGIE is designed to enhance image editing by utilizing the strengths of language models in guiding the editing process. Through a series of steps, MGIE allows for efficient image editing, particularly for high-resolution images.

Getting Started: Setup Requirements

Before diving into image editing, ensure that you follow these essential steps for a successful inference run:

You need to merge the LLaVA weight deltas with the original LLaMA parameters. This is crucial to achieve desired outcomes. Refer to the official repository for detailed instructions.
For memory management, note that MGIE decouples the inference pipeline into two broad stages as follows:

First, calculate all the embeddings in a batched manner using the LLaVA model and edit head.
After that, pop it off the memory to gain VRAM before loading the InstructPix2Pix pipeline for editing.

Processing Ultra High-Resolution Images

The InstructPix2Pix pipeline by design does not include resizing capabilities. Therefore, ultra high-resolution images may lead to Out of Memory (OOM) errors during processing. To overcome this, resizing images while preserving their aspect ratio is recommended.

Resize Utility Function

Utilize the following utility function to resize images properly:

from diffusers.utils import load_image

def resize_image_aspect_ratio(img_url, base_width=None, base_height=None):
    # Load the image
    img = load_image(img_url).convert("RGB")
    # Get the current width and height of the image
    width, height = img.size
    # Calculate the new dimensions based on the aspect ratio
    if base_width is not None:
        # Calculate new height based on the base_width to maintain aspect ratio
        w_percent = (base_width / float(width))
        h_size = int((float(height) * float(w_percent)))
        new_size = (base_width, h_size)
    elif base_height is not None:
        # Calculate new width based on the base_height to maintain aspect ratio
        h_percent = (base_height / float(height))
        w_size = int((float(width) * float(h_percent)))
        new_size = (w_size, base_height)
    else:
        raise ValueError("Either base_width or base_height must be provided")
    # Resize the image
    resized_img = img.resize(new_size, Image.ANTIALIAS)
    return resized_img

Here’s a quick analogy to simplify the resizing function:

Imagine you have a large painting (your image) that you wish to hang in a specific frame (target dimensions). Resizing your painting correctly requires you to select either the height or width (base_width or base_height) of the frame while maintaining the scale of your painting’s image (aspect ratio). The function carefully calculates the dimensions so that your painting fits beautifully without being distorted. Once the dimensions are aligned, your painting can be resized to fit snugly in its new home.

Troubleshooting Common Issues

If you encounter any issues when using MGIE, here are some troubleshooting steps that might help:

Verify that the LLaVA model parameters are merged correctly — missing this step often leads to unexpected results.
Make sure your images are resized appropriately to avoid OOM errors. Utilize the provided utility function to assist you.
Clear unnecessary data from your VRAM before running inference to allow for better memory allocation.
If you’re still having trouble, double-check the official repository for updates or missing steps in your setup.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Leveraging the MGIE library can significantly enhance your image editing tasks with the integration of multimodal language models. By following the setup requirements, utilizing the resizing function, and troubleshooting effectively, you will be well on your way to creating stunning, edited images.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox