How to Use TF-ICON for Diffusion-Based Cross-Domain Image Composition

Jan 5, 2024 | Data Science

Creating stunning images by integrating user-provided objects into a specific visual context has just become easier with the innovative TF-ICON framework. This tool harnesses the power of text-driven diffusion models, allowing for seamless image composition without the need for extensive pre-training or customization. If you’re eager to dive into this advanced image composition technique, this guide will walk you through the process.

Setting Up Your Environment

Before you start creating stunning visual compositions, there are a few setup steps you need to complete. Follow the instructions below for a smooth start.

1. Creating a Conda Environment

  • Clone the TF-ICON repository:
  • git clone https://github.com/Shilin-Lu/TF-ICON.git
  • Change the directory into the cloned repository:
  • cd TF-ICON
  • Create the Conda environment using the supplied YAML file:
  • conda env create -f tf_icon_env.yaml
  • Activate the environment:
  • conda activate tf-icon

2. Downloading Stable-Diffusion Weights

You will need to download the weights for Stable-Diffusion:

  • Visit the Hugging Face page and acquire the sd-v2-1_512-ema-pruned.ckpt file.
  • Place the downloaded file in the .ckpt folder of your TF-ICON project directory.

Running TF-ICON

With everything set up, it’s time to prepare your images for composition. TF-ICON requires structured input data consisting of backgrounds, foregrounds, segmentation masks, and user masks. Here’s how to get started:

Data Preparation

The input directory structure should look something like this:

inputs
├── cross_domain
│   ├── prompt1
│   │   ├── bgxx.png
│   │   ├── fgxx.png
│   │   ├── fgxx_mask.png
│   │   └── mask_bg_fg.png
│   └── prompt2
└── same_domain
    ├── prompt1
    │   ├── bgxx.png
    │   ├── fgxx.png
    │   ├── fgxx_mask.png
    │   └── mask_bg_fg.png
    └── prompt2

Make sure your foreground images are of suitable resolution. Now you can run the TF-ICON script:

Image Composition

To execute image composition, choose between the cross-domain and same-domain modes and use the relevant command:

Cross-Domain Mode

python scripts/main_tf_icon.py --ckpt pathtomodel.ckpt --root .inputs/cross_domain --domain cross --dpm_steps 20 --dpm_order 2 --scale 5 --tau_a 0.4 --tau_b 0.8 --outdir .outputs --gpu cuda:0 --seed 3407

Same-Domain Mode

python scripts/main_tf_icon.py --ckpt pathtomodel.ckpt --root .inputs/same_domain --domain same --dpm_steps 20 --dpm_order 2 --scale 2.5 --tau_a 0.4 --tau_b 0.8 --outdir .outputs --gpu cuda:0 --seed 3407

Understanding the Commands

To better grasp how these commands function, let’s use an analogy. Imagine you are a chef preparing a beautiful cake:

  • –ckpt: Think of the checkpoint as your main recipe book, guiding you through the process of image composition.
  • –root: This is like your kitchen—a designated space where all your ingredients (images) are stored.
  • –domain: This parameter is akin to deciding whether you want a chocolate cake (same domain) or a fruit cake (cross domain).
  • –dpm_steps: Just like the number of mixing and baking steps, this defines how many diffusion sampling steps will be performed.
  • –scale: This delineates how much flavor (detail) you want in your cake, or visual quality in the image.

In short, each of these parameters shapes the resulting output as a recipe shapes a cake.

TF-ICON Test Benchmark

The comprehensive TF-ICON testing benchmarks can be found here. Use this resource to enhance your research and find critical insights.

Troubleshooting

If you encounter issues while using TF-ICON, here are some troubleshooting tips:

  • Environment Issues: Ensure you have the correct Conda environment activated.
  • Loading Errors: Verify that the Stable-Diffusion weights are accurately downloaded and placed in the specified folder.
  • GPU Problems: Make sure your system’s GPU meets the VRAM requirement and that it’s appropriately configured for CUDA.
  • Image Formatting: Be certain that your input images are in the formats required (e.g., PNG).

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Additional Results

TF-ICON supports various styles and compositions, including:

  • Sketchy Painting
  • Oil Painting
  • Photorealism
  • Cartoon Style

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Acknowledgments

We express our gratitude to the original creators and contributors of the frameworks that TF-ICON builds upon, particularly Stable-Diffusion and Prompt-to-Prompt.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox