StoryMaker: Towards Consistent Characters in Text-to-Image Generation

Oct 28, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesRED-AIGC_StoryMaker

StoryMaker is an innovative personalization solution that not only preserves the consistency of faces but also clothing, hairstyles, and bodies in multiple character scenes. This capability opens the door to creating a story consisting of a series of images, portraying events seamlessly.

Visual Demos

Visualization of generated images by StoryMaker

In this visualization, the first three rows depict a day in the life of an office worker, while the last two rows narrate a story inspired by the movie “Before Sunrise”.

Two Portraits Synthesis

Diverse Application

Diverse Applications

How to Download and Set Up StoryMaker

To utilize StoryMaker, you need to download the model from Huggingface. If you encounter issues accessing Huggingface, you can use hf-mirror to download models as follows:

export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download --resume-download RED-AIGC/StoryMaker --local-dir checkpoints --local-dir-use-symlinks False

The face encoder needs to be manually downloaded from the following URL. After preparing all models, your folder structure should look like this:

models/
- checkpoints/
  - mask.bin
  - pipeline_sdxl_storymaker.py
  - README.md

Getting Started with StoryMaker

Now, let’s walk through how to set up and use StoryMaker effectively. We will compare this process to preparing a fine meal, where each ingredient is essential for achieving a delightful result.

1. **Ingredients – Libraries**: You need to gather your ingredients (libraries) for cooking. Install the required libraries:

!pip install opencv-python transformers accelerate insightface

2. **Preparation – Initialize**: Like preparing your ingredients before you start cooking, we will need to import these libraries and set up the framework:

import diffusers
import cv2
import torch
import numpy as np
from PIL import Image
from insightface.app import FaceAnalysis
from pipeline_sdxl_storymaker import StableDiffusionXLStoryMakerPipeline

3. **Cooking – Configurations**: Just like following a recipe, carefully configure the models:

# Prepare buffalo_l under .models/
app = FaceAnalysis(name='buffalo_l', root='.', providers=[CUDAExecutionProvider, CPUExecutionProvider])
app.prepare(ctx_id=0, det_size=(640, 640))

# Prepare models under .checkpoints
face_adapter = 'checkpoints/mask.bin'
image_encoder_path = 'laion/CLIP-ViT-H-14-laion2B-s32B-b79K'
base_model = 'huaquan/YamerMIX_v11'
pipe = StableDiffusionXLStoryMakerPipeline.from_pretrained(base_model, torch_dtype=torch.float16)
pipe.cuda()

# Load adapter
pipe.load_storymaker_adapter(image_encoder_path, face_adapter, scale=0.8, lora_scale=0.8)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

4. **Customizing – Image Processing**: Here comes the fun part! Load an image and customize your dish by creating prompts:

# Load an image and mask
face_image = Image.open('examples/ldh.png').convert("RGB")
mask_image = Image.open('examples/ldh_mask.png').convert("RGB")
face_info = app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR))
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1]  # Only use the maximum face

prompt = "A person is taking a selfie, the person is wearing a red hat, and a volcano is in the distance."
n_prompt = "bad quality, NSFW, low quality, ugly, disfigured, deformed"

generator = torch.Generator(device='cuda').manual_seed(666)
for i in range(4):
    output = pipe(
        image=face_image, mask_image=mask_image, face_info=face_info,
        prompt=prompt,
        negative_prompt=n_prompt,
        ip_adapter_scale=0.8, lora_scale=0.8,
        num_inference_steps=25,
        guidance_scale=7.5,
        height=1280, width=960,
        generator=generator,
    ).images[0]
    output.save(f'examples/results/ldh666_new_{i}.jpg')

Troubleshooting

If you experience issues downloading any models, ensure you have a stable internet connection and confirm that the URLs are correct.
Make sure your folder structure aligns with the expected setup, as misplacement of model files can create errors.
If your generated images don’t align with your prompts, consider adjusting your prompts or increasing the number of inference steps.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox