Your Diffusion Model is Secretly a Zero-Shot Classifier

Feb 16, 2024 | Data Science

This is the official implementation of the ICCV 2023 paper Your Diffusion Model is Secretly a Zero-Shot Classifier by Alexander Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, and Deepak Pathak.

Abstract

The advent of large-scale text-to-image diffusion models has revolutionized text-based image generation. These models generate realistic images for a wealth of prompts, showcasing impressive compositional generalization abilities. Traditionally, the usage of diffusion models has focused solely on sampling; however, they also provide conditional density estimates useful for various tasks beyond image generation. Our paper presents a novel application of these density estimates from diffusion models like Stable Diffusion for zero-shot classification, a process we term the Diffusion Classifier. This approach yields strong results across numerous benchmarks, outperforming traditional methods for leveraging knowledge from diffusion models.

Installation

To get started, create a conda environment using the following command:

conda env create -f environment.yml

If the installation takes longer than expected, expedite it by setting conda to use the libmamba solver:

conda config --set solver libmamba

Zero-shot Classification with Stable Diffusion

To perform zero-shot classification, run the following command:

python eval_prob_adaptive.py --dataset cifar10 --split test --n_trials 1 --to_keep 5 1 --n_samples 50 500 --loss l1 --prompt_path prompts/cifar10_prompts.csv

This command reads prompts from a CSV file and evaluates the epsilon prediction loss for each prompt using Stable Diffusion. It can function on a variety of GPUs, from the modest 2080Ti to the powerful A6000. Losses will be logged separately for each test image in the specified directory.

Understanding Code Execution Through Analogy

Imagine your diffusion model as a skilled artist who creates paintings based solely on a few key words provided by an art director (the prompt). Just as the artist adjusts the brush strokes and color palette based on the director’s feedback to create the desired aesthetic, the diffusion model fine-tunes the final images based on the conditioning of the diffusion process. This collaboration (zero-shot classification) does not require additional training, as the model has learned from its vast range of experiences (datasets) how to interpret and generate visually appealing results accurately.

Troubleshooting Ideas

If you experience delays in evaluation, consider the following options:

  • Parallelize evaluation across multiple workers using the --n_workers and --worker_idx flags.
  • Experiment with the evaluation strategy (e.g., adjust --n_samples and --to_keep).
  • Evaluate a smaller dataset subset, using the --subset_path flag for specific indices.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Evaluating on Your Own Dataset

  1. Create a CSV file containing the prompts for evaluation, ensuring they match the respective class labels. Check scripts/write_cifar10_prompts.py for guidance.
  2. Run your evaluation using the command mentioned previously, adjusting flags as necessary for your dataset.
  3. To reduce evaluation time, consider testing on a smaller subset initially.

Standard ImageNet Classification with Class-Conditional Diffusion Models

Additional Installations

Within the diffusion-classifier folder, clone the DiT repository:

git clone git@github.com:facebookresearch/DiT.git

Running Diffusion Classifier

Follow the steps below to ensure the proper evaluation process:

  • First, save a consistent set of noise (epsilon) for all image-class pairs:
  • python scripts/save_noise.py --img_size 256
  • Next, compute and save the epsilon-prediction error for each class:
  • python eval_prob_dit.py --dataset imagenet --split test --noise_path noise_256.pt --randomize_noise --batch_size 32 --cls CLS --t_interval 4 --extra dit256 --save_vb

For ImageNet, ensure to run this for all class labels from 0 to 999. This process can be time-consuming; therefore, it’s advisable to utilize the --subset_path command to assess a smaller dataset. An adaptive version is underway for a faster assessment.

Compositional Reasoning on Winoground with Stable Diffusion

To run the Diffusion Classifier on Winoground:

  1. Save a continuous set of noise to be utilized for all image-caption pairs:
  2. python scripts/save_noise.py --img_size 512
  3. Then evaluate on Winoground with the following command:
  4. python run_winoground.py --model sd --version 2-0 --t_interval 1 --batch_size 32 --noise_path noise_512.pt --randomize_noise --interpolation bicubic

To execute CLIP or OpenCLIP baselines, run:

python run_winoground.py --model clip --version ViT-L14
python run_winoground.py --model openclip --version ViT-H-14

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox