Welcome to the world of CatVTON, an innovative virtual try-on diffusion model designed to simplify and enhance the fashion experience using AI! Whether you’re a developer looking to implement this solution or an enthusiast keen to learn more about its deployment, this guide will help you navigate through the setup and usage of CatVTON.
What is CatVTON?
CatVTON is a lightweight, efficient model specifically crafted for virtual try-ons. It boasts:
- Lightweight Network: A total of 899.06M parameters.
- Parameter-Efficient Training: Only 49.57M trainable parameters.
- Simplified Inference: Requires less than 8GB VRAM for resolutions of 1024×768.
Updates
Stay informed about the latest developments with CatVTON:
- **`2024/7/24`**: Our Paper on ArXiv is now available π₯³!
- **`2024/7/22`**: Our App Code has been released, allowing you to deploy CatVTON on your own machine π!
- **`2024/7/21`**: Our Inference Code and Weights are available.
- **`2024/7/11`**: Our Online Demo has been launched π.
Installation
To get started with CatVTON, follow the Installation Guide to set up the required conda environment. You will need Detectron2 and DensePose for deploying the app, although they are not required for inference tasks.
Deployment (Gradio App)
Deploying the Gradio App for CatVTON is remarkably simple. Just execute the following command to run it on your machine and automatically download the checkpoints from HuggingFace:
CUDA_VISIBLE_DEVICES=0 python app.py \
--output_dir="resource/demo/output" \
--mixed_precision="bf16" \
--allow_tf32
Note that, when using bf16 precision, generating results at a resolution of 1024×768 will require less than 8GB VRAM.
Data Preparation for Inference
To perform inference, you need to restrict yourself to either the VITON-HD or DressCode dataset. Hereβs how your folder structure should look after downloading the datasets:
βββ VITON-HD
| βββ test_pairs_unpaired.txt
| βββ test
| | βββ image
| | | βββ [000006_00.jpg | 000008_00.jpg | ...]
| | βββ cloth
| | | βββ [000006_00.jpg | 000008_00.jpg | ...]
| | βββ agnostic-mask
| | | βββ [000006_00_mask.png | 000008_00.png | ...]
...
For the DressCode dataset, don’t forget to download and place the provided agnostic masks in the appropriate folder structure.
Running Inference on VTIONHD/DressCode
To run inference on either the DressCode or VITON-HD dataset, you can use the following command:
CUDA_VISIBLE_DEVICES=0 python inference.py \
--dataset [dresscode | vitonhd] \
--data_root_path \
--output_dir \
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 555 \
--mixed_precision [no | fp16 | bf16] \
--allow_tf32 \
--repaint \
--eval_pair
Acknowledgements
Our work is inspired by the contributions from the community, particularly based on Diffusers and the Stable Diffusion v1.5 inpainting model. We also utilized SCHP and DensePose for automatic mask generation.
Troubleshooting
In case you encounter any issues during installation or usage, consider the following troubleshooting measures:
- Double-check that all dependencies have been installed correctly.
- Ensure that your folder structures for datasets are consistent with the recommended layout.
- Verify that your VRAM allocation is sufficient for the resolution you are trying to process.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

