How to Use MGIE for Image Editing with Multimodal Large Language Models

Feb 20, 2024 | Educational

Welcome to the world of advanced image editing! In this article, we’ll explore how to use the MGIE (Guiding Instruction-based Image Editing via Multimodal Large Language Models) repository effectively, ensuring you can handle ultra high-resolution images with ease.

What is MGIE?

MGIE is a powerful tool designed for image editing leveraging the UNet and LLaVA model checkpoints. By harnessing the capabilities of multilevel language models, it provides a way to guide image edits more intuitively. The key takeaway? It breaks down the inference pipeline into two broad stages, optimizing memory usage and performance.

Getting Started with MGIE

Before diving into the code, there’s some crucial setup required:

  • Check out the detailed instructions provided in the official repository.
  • Make sure to merge the LLaVA weight deltas with the original LLaMA parameters—details available in the repository.

Processing Ultra High-Resolution Images

The InstructPi2xPi2x pipeline does not automatically resize images, so processing ultra high-resolution images may lead to Out of Memory (OOM) errors. To tackle this, it’s advisable to resize the images while maintaining their aspect ratio. Here’s how you can do it:

from diffusers.utils import load_image

def resize_image_aspect_ratio(img_url, base_width=None, base_height=None):
    # Load the image
    img = load_image(img_url).convert('RGB')
    # Get the current width and height of the image
    width, height = img.size
    
    # Calculate the new dimensions based on the aspect ratio
    if base_width is not None:
        # Calculate new height based on the base_width to maintain aspect ratio
        w_percent = (base_width / float(width))
        h_size = int((float(height) * float(w_percent)))
        new_size = (base_width, h_size)
    elif base_height is not None:
        # Calculate new width based on the base_height to maintain aspect ratio
        h_percent = (base_height / float(height))
        w_size = int((float(width) * float(h_percent)))
        new_size = (w_size, base_height)
    else:
        raise ValueError("Either base_width or base_height must be provided")
    
    # Resize the image
    resized_img = img.resize(new_size, Image.ANTIALIAS)
    return resized_img

Understanding the Code Through Analogy

Think of the resizing function as a tailored suit for a person. The original image is like a customer seeking a suit. In the tailoring process, the tailor needs specific measurements (base width or base height), just as the function needs either base_width or base_height to resize the image correctly. The tailor then makes adjustments, maintaining the proper proportions, ensuring the suit fits the customer perfectly without losing its style. Similarly, the function preserves the image’s aspect ratio while resizing, making sure it looks great in its new dimensions!

Troubleshooting

If you encounter issues while running the setup or during inference, consider the following:

  • Ensure that all dependencies are correctly installed.
  • Double-check that you’ve merged the LLaVA weight deltas accurately.
  • If OOM errors persist, consider resizing the images to lower dimensions before processing.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox