How to Use InstantID for Image Generation

Jan 25, 2024 | Educational

InstantID is an innovative tool designed for ID-Preserving image generation using distinct single images. In this article, we will walk through the process of setting up and utilizing this cutting-edge model with a user-friendly approach. So, let’s dive in!

Introduction

The InstantID model brings forth a tuning-free method to generate images that maintain identity while offering a myriad of downstream tasks. It supports a variety of applications and is designed for efficiency and effectiveness.

Step-by-Step Usage

To get started with InstantID, you can easily download the model from its repository or utilize a Python script for the installation.

Downloading the Model

Use the following Python script to download the required components:

from huggingface_hub import hf_hub_download

hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/config.json", local_dir="./checkpoints")
hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/diffusion_pytorch_model.safetensors", local_dir="./checkpoints")
hf_hub_download(repo_id="InstantX/InstantID", filename="ip-adapter.bin", local_dir="./checkpoints")

Additionally, for the face encoder, you must manually download it via this URL to the `models/antelopev2` directory.

Setting Up the Environment

To prepare your environment, you’ll need the following Python packages:

# !pip install opencv-python transformers accelerate insightface
import diffusers
from diffusers.utils import load_image
from diffusers.models import ControlNetModel
import cv2
import torch
import numpy as np
from PIL import Image
from insightface.app import FaceAnalysis
from pipeline_stable_diffusion_xl_instantid import StableDiffusionXLInstantIDPipeline, draw_kps

Load and Customize Your Image

The setup requires a bit of customization for the face image you want to work with:

# prepare 'antelopev2' under ./models
app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

# prepare models under ./checkpoints
face_adapter = f'./checkpoints/ip-adapter.bin' 
controlnet_path = f'./checkpoints/ControlNetModel'

# load IdentityNet
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.cuda()

# load adapter
pipe.load_ip_adapter_instantid(face_adapter)

Next, load your desired image and generate the face embeddings:

# load an image
image = load_image("your-example.jpg")

# prepare face emb
face_info = app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR))
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*x['bbox'][3]-x['bbox'][1])[-1]  # only use the maximum face
face_emb = face_info['embedding']
face_kps = draw_kps(face_image, face_info['kps'])
pipe.set_ip_adapter_scale(0.8)

# generate image
prompt = "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality"
negative_prompt = "(lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured"

image = pipe(prompt, image_embeds=face_emb, image=face_kps, controlnet_conditioning_scale=0.8).images[0]

Understanding the Code: An Analogy

Imagine you are a chef preparing an exquisite dish (the output image) using a complex recipe (the code). Here’s how each part aligns:

Ingredients Gathering: Just like gathering ingredients from your pantry (downloading the model), you must prepare all necessary components.
Marinating: Setting up the environment and preparing what you need (loading and customizing your image) is akin to marinating your ingredients to enhance flavors.
Cooking: The act of using the algorithm (generating the face embeddings and image) is like cooking your ingredients together to create a delightful meal.
Serving: Finally, presenting your dish (the final output image) to impress your guests mirrors how you showcase the generated image.

Troubleshooting

If you encounter any issues during the setup or usage of InstantID, here are some troubleshooting tips:

If you’re dissatisfied with the similarity in generated images, consider increasing the “IdentityNet Strength” and “Adapter Strength.”
If the saturation appears too high, first decrease the Adapter strength. If it persists, decrease the IdentityNet strength.
If text control isn’t functioning as expected, try decreasing the Adapter strength.
If the realistic style does not meet your expectations, visit our GitHub repository for a more realistic base model.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

Using InstantID for identity-preserving image generation could open new avenues in creative arts and AI. Whether you are a developer, artist, or AI enthusiast, this model can enhance your projects by generating unique images tailored to your needs.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox