This is the official implementation of the ICCV 2023 paper Your Diffusion Model is Secretly a Zero-Shot Classifier by Alexander Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, and Deepak Pathak.
Abstract
The advent of large-scale text-to-image diffusion models has revolutionized text-based image generation. These models generate realistic images for a wealth of prompts, showcasing impressive compositional generalization abilities. Traditionally, the usage of diffusion models has focused solely on sampling; however, they also provide conditional density estimates useful for various tasks beyond image generation. Our paper presents a novel application of these density estimates from diffusion models like Stable Diffusion for zero-shot classification, a process we term the Diffusion Classifier. This approach yields strong results across numerous benchmarks, outperforming traditional methods for leveraging knowledge from diffusion models.
Installation
To get started, create a conda environment using the following command:
conda env create -f environment.yml
If the installation takes longer than expected, expedite it by setting conda to use the libmamba solver:
conda config --set solver libmamba
Zero-shot Classification with Stable Diffusion
To perform zero-shot classification, run the following command:
python eval_prob_adaptive.py --dataset cifar10 --split test --n_trials 1 --to_keep 5 1 --n_samples 50 500 --loss l1 --prompt_path prompts/cifar10_prompts.csv
This command reads prompts from a CSV file and evaluates the epsilon prediction loss for each prompt using Stable Diffusion. It can function on a variety of GPUs, from the modest 2080Ti to the powerful A6000. Losses will be logged separately for each test image in the specified directory.
Understanding Code Execution Through Analogy
Imagine your diffusion model as a skilled artist who creates paintings based solely on a few key words provided by an art director (the prompt). Just as the artist adjusts the brush strokes and color palette based on the director’s feedback to create the desired aesthetic, the diffusion model fine-tunes the final images based on the conditioning of the diffusion process. This collaboration (zero-shot classification) does not require additional training, as the model has learned from its vast range of experiences (datasets) how to interpret and generate visually appealing results accurately.
Troubleshooting Ideas
If you experience delays in evaluation, consider the following options:
- Parallelize evaluation across multiple workers using the
--n_workers
and--worker_idx
flags. - Experiment with the evaluation strategy (e.g., adjust
--n_samples
and--to_keep
). - Evaluate a smaller dataset subset, using the
--subset_path
flag for specific indices.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Evaluating on Your Own Dataset
- Create a CSV file containing the prompts for evaluation, ensuring they match the respective class labels. Check
scripts/write_cifar10_prompts.py
for guidance. - Run your evaluation using the command mentioned previously, adjusting flags as necessary for your dataset.
- To reduce evaluation time, consider testing on a smaller subset initially.
Standard ImageNet Classification with Class-Conditional Diffusion Models
Additional Installations
Within the diffusion-classifier folder, clone the DiT repository:
git clone git@github.com:facebookresearch/DiT.git
Running Diffusion Classifier
Follow the steps below to ensure the proper evaluation process:
- First, save a consistent set of noise (epsilon) for all image-class pairs:
python scripts/save_noise.py --img_size 256
python eval_prob_dit.py --dataset imagenet --split test --noise_path noise_256.pt --randomize_noise --batch_size 32 --cls CLS --t_interval 4 --extra dit256 --save_vb
For ImageNet, ensure to run this for all class labels from 0 to 999. This process can be time-consuming; therefore, it’s advisable to utilize the --subset_path
command to assess a smaller dataset. An adaptive version is underway for a faster assessment.
Compositional Reasoning on Winoground with Stable Diffusion
To run the Diffusion Classifier on Winoground:
- Save a continuous set of noise to be utilized for all image-caption pairs:
- Then evaluate on Winoground with the following command:
python scripts/save_noise.py --img_size 512
python run_winoground.py --model sd --version 2-0 --t_interval 1 --batch_size 32 --noise_path noise_512.pt --randomize_noise --interpolation bicubic
To execute CLIP or OpenCLIP baselines, run:
python run_winoground.py --model clip --version ViT-L14
python run_winoground.py --model openclip --version ViT-H-14
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.