LaDI-VTON (Latent Diffusion Textual-Inversion Enhanced Virtual Try-On) opens a new realm in the virtual fashion industry by leveraging advanced generative models to create realistic images of clothing on virtual models. In this guide, we will walk you through the process of using LaDI-VTON, from installation to inference, and troubleshooting.
Getting Started
The first step to harnessing the power of LaDI-VTON is setting up your environment. We recommend using the Anaconda package manager to avoid dependency issues.
Installation
- Clone the repository:
- Install Python dependencies:
- If you prefer, create a new conda environment manually:
git clone https://github.com/miccunifi/ladi-vton
conda env create -n ladi-vton -f environment.yml
conda activate ladi-vton
conda create -n ladi-vton -y python=3.10
conda activate ladi-vton
pip install torch==2.0.1 torchvision==0.15.2 opencv-python==4.7.0.72 diffusers==0.14.0 transformers==4.27.3 accelerate==0.18.0 clean-fid==0.1.35 torchmetrics[image]==0.11.4 wandb==0.14.0 matplotlib==3.7.1 tqdm xformers
Data Preparation
LaDI-VTON works with two major datasets: DressCode and VITON-HD. Follow the steps to set them up:
For DressCode:
- Download the DressCode dataset.
- To enhance performance, use in-shop images with a white background.
- Download pre-extracted masks from here, and place them in the dataset folder alongside your images.
For VITON-HD:
- Download the VITON-HD dataset.
Inference with Pre-trained Models
To run inference on either dataset, use the command below:
python src/inference.py --dataset [dresscode vitonhd] --dresscode_dataroot path --vitonhd_dataroot path --output_dir path --test_order [paired unpaired] --category [all lower_body upper_body dresses] --mixed_precision [no fp16 bf16] --enable_xformers_memory_efficient_attention --use_png --compute_metrics
Be sure to replace the placeholders with actual data paths.
Training Your Model
Training the model consists of several steps:
1. Train Warping Module:
python src/train_tps.py --dataset [dresscode vitonhd] --dresscode_dataroot path --vitonhd_dataroot path --checkpoints_dir path --exp_name str
2. Train EMASC Module:
python src/train_emasc.py --dataset [dresscode vitonhd] --dresscode_dataroot path --vitonhd_dataroot path --output_dir path
3. Extract and Pre-train Inversion Adapter.
python src/train_inversion_adapter.py --dataset [dresscode vitonhd] --dresscode_dataroot path --vitonhd_dataroot path --output_dir path
And finally, to train the VTO model:
python src/train_vto.py --dataset [dresscode vitonhd] --dresscode_dataroot path --vitonhd_dataroot path --output_dir path
Troubleshooting
If you encounter issues while getting started, consider the following troubleshooting tips:
- Ensure all dependencies are properly installed.
- Check if your data paths are correctly set.
- Verify that you have sufficient resources (memory, GPU capability) to run large models.
- If you encounter CUDA errors, ensure that your GPU drivers are up to date.
- For any unexpected errors, refer to the GitHub Issues Page for solutions.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
In Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations. Enjoy creating stunning virtual try-ons with LaDI-VTON!