How to Use the Würstchen Framework for Text-Conditional Models

Feb 18, 2021 | Data Science

The Würstchen framework, which has gained recognition for its efficiency in training text-conditional models, brings a fascinating twist to the classic approach by introducing a multi-stage compression technique. This blog will guide you through using Würstchen effectively, making the daunting world of AI accessible and straightforward. Let’s dive in!

What is Würstchen?

Würstchen is an innovative framework designed for training text-conditional models, taking the computationally heavy lifting into a highly compressed latent space. With a unique multi-stage approach – specifically, Stage A, B, and C – Würstchen achieves a remarkable 42x compression without compromising image reconstruction quality. This makes the training of Stage C both fast and cost-effective. For further technical insights, you can refer to the paper.

Using the Würstchen Framework

You can easily use the Würstchen model through several notebooks available in its repository. Here’s how:

  • Stage B notebook is dedicated to image reconstruction.
  • Stage C notebook focuses on text-conditional generation.
  • You can also try the text-to-image generation directly on Google Colab.

Integrating Würstchen in Diffusers

Würstchen is fully integrated into the diffusers library. Here’s a simple example of how to use it:

python
# pip install -U transformers accelerate diffusers
import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS

pipe = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("cuda")

caption = "Anthropomorphic cat dressed as a firefighter"
images = pipe(
    caption,
    width=1024,
    height=1536,
    prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS,
    prior_guidance_scale=4.0,
    num_images_per_prompt=2,
).images

In this code, you import necessary libraries and set up a pipeline to generate images based on text prompts. Think of this like ordering a customized dish at a restaurant – you specify what you want, and the chef (the pipeline) prepares it just for you!

Training Your Own Würstchen Model

Training your own Würstchen model is efficient and cost-effective due to the smaller latent space of 12×12. You can find training scripts for both Stage B and Stage C in the following links:

Downloading Models

Here are the available models for download:

Model Download Parameters Conditioning Training Steps Resolution
Würstchen v1 Hugging Face 1B (Stage C) + 600M (Stage B) + 19M (Stage A) CLIP-H-Text 800,000 512×512
Würstchen v2 Hugging Face 1B (Stage C) + 600M (Stage B) + 19M (Stage A) CLIP-bigG-Text 918,000 1024×1024

Troubleshooting

If you run into issues while using Würstchen, here are some common troubleshooting ideas:

  • Ensure you have installed all required dependencies, including the diffusers library.
  • Check that your environment supports CUDA for GPU acceleration.
  • Make sure your input captions are formatted correctly and are clear.
  • If you’re experiencing performance issues, consider reducing the image dimensions.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Acknowledgments

Special thanks to Stability AI for providing compute resources for our research.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox