How to Use the DMD2 Model for Image Synthesis

Jun 11, 2024 | Educational

Welcome to the world of improved image synthesis! In this guide, we will walk you through how to implement the Improved Distribution Matching Distillation (DMD2) leveraging diffusion models. DMD2 is a powerful tool that enables fast and efficient image generation.

Prerequisites

  • Python 3.7 or later
  • PyTorch installed
  • Diffusers library
  • Access to GPU for fast computations

Getting Started with DMD2

To efficiently utilize the DMD2 model, you can follow these steps for various types of generation. Let’s start by loading the required libraries!

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

1. 4-Step UNet Generation

Using the 4-step UNet method for image synthesis is akin to carefully assembling a jigsaw puzzle. Each piece has its place and contributes to the big picture. Here’s how you can implement it:

# Define model IDs
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin"

# Load model
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Generate image
prompt = "a photo of a cat"
image = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=(999, 749, 499, 249)).images[0]

2. 4-Step LoRA Generation

Exploring the LoRA (Low-Rank Adaptation) approach allows for efficient adaptation of existing models—almost like tuning a musical instrument to get the perfect pitch. Follow the steps below to undertake this method:

# Define model IDs
ckpt_name = "dmd2_sdxl_4step_lora_fp16.safetensors"

# Load model
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora(lora_scale=1.0)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Generate image
image = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=(999, 749, 499, 249)).images[0]

Troubleshooting

If you encounter any issues while running the code, consider the following tips:

  • Double-check that you have all the necessary libraries installed.
  • Ensure your GPU is correctly configured and recognized by PyTorch.
  • If the model fails to load, verify the model ID and check your internet connection for downloading resources.
  • For further questions, feel free to contact Tianwei Yin.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

3. 1-Step UNet Generation

For a quicker image synthesis method, use the 1-step UNet generation. This is like taking a shortcut across a field instead of following the longer path of the forest.

# Define model IDs
ckpt_name = "dmd2_sdxl_1step_unet_fp16.bin"

# Load model
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Generate image
image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, timesteps=(399)).images[0]

4. T2I Adapter Implementation

If you’re looking for a more advanced technique, consider using the T2I adapter. This approach will help you adapt more complex tasks into your image generation, similar to using a universal remote to control multiple devices.

# Load adapter
adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, variant="fp16").to("cuda")
vae = AutoencoderKL.from_pretrained("madebyollins/dxl-vae-fp16-fix", torch_dtype=torch.float16)

# Load model
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    base_model_id, unet=unet, vae=vae, adapter=adapter, torch_dtype=torch.float16, variant="fp16"
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Generate image
image = canny_detector(image, detect_resolution=384, image_resolution=1024)
prompt = "Mystical fairy in real, magic, 4k picture, high quality"
gen_images = pipe(prompt=prompt, image=image, num_inference_steps=4, guidance_scale=0, adapter_conditioning_scale=0.8, adapter_conditioning_factor=0.5, timesteps=(999, 749, 499, 249)).images[0]
gen_images.save("out_canny.png")

Conclusion

By following these steps, you should be able to harness the power of DMD2 for captivating image synthesis. Whether you’re aiming for simplicity or complexity in your creations, this model has you covered!

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox