Welcome to the fascinating world of AsyncDiff, where we empower diffusion models to work faster and more efficiently through asynchronous denoising. Whether you’re a seasoned coder or a curious newbie, this guide will walk you through the essentials of setting up AsyncDiff while unlocking the wonders of parallel processing within your models.
Understanding AsyncDiff
Imagine you’re trying to complete a massive jigsaw puzzle. If you work alone, it may take hours to fit each piece together sequentially. However, if you divide the puzzle among your friends, each person picks a section to tackle simultaneously. This is similar to what AsyncDiff does with its denoising model. AsyncDiff divides the noise prediction task into smaller, manageable parts and assigns these parts to different devices, allowing them to work at the same time, which speeds up the entire process without compromising the quality of the final image.
Quick Start Guide
Follow these steps to get started with AsyncDiff on your system.
Installation Steps
- Ensure you have an NVIDIA GPU with CUDA = 12.0 and the corresponding CuDNN.
- Create an environment and install the necessary dependencies.
conda create -n asyncdiff python=3.10
conda activate asyncdiff
pip install -r requirements.txt
Usage Example
Adding support for async processing is as simple as a few lines of code:
import torch
from diffusers import StableDiffusionPipeline
from asyncdiff.async_sd import AsyncDiff
pipeline = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, use_safetensors=True, low_cpu_mem_usage=True)
async_diff = AsyncDiff(pipeline, model_n=2, stride=1, time_shift=False)
async_diff.reset_state(warm_up=1)
image = pipeline(prompts).images[0]
if dist.get_rank() == 0:
image.save(f"output.jpg")
Inference Acceleration
To accelerate inference, use the following commands for different model types:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 --run-path examples/run_sdxl.py
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 --run-path examples/run_sd.py
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_sd3.py
Troubleshooting Tips
If you encounter issues during setup or execution, consider the following troubleshooting steps:
- Ensure your CUDA version is compatible with your GPU and PyTorch installations.
- Verify that your environment has all the required packages installed as specified in the
requirements.txt. - Make sure you are running the correct command aligned with your device configuration.
- Check for any typos in your code, especially when specifying model names or paths.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
By leveraging AsyncDiff, you are stepping into a world of accelerated image generation that not only enhances performance but also maintains the integrity of your outputs. This circular dance of components working in harmony is what makes AsyncDiff a game-changer in the diffusion model landscape.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence ensuring that our clients benefit from the latest technological innovations.

