Welcome to your comprehensive guide on implementing Encoder-based Domain Tuning (E4T) for fast personalization of text-to-image models! In this article, we’ll walk you through the installation, pre-training, domain-tuning, and inference processes, all while keeping it as user-friendly as possible. Let’s dive in!
Installation
To start, you’ll need to set up the E4T-Diffusion environment. Follow these simple steps:
- Clone the repository:
$ git clone https://github.com/mkshinge/e4t-diffusion.git
$ cd e4t-diffusion
$ pip install -r requirements.txt
Model Zoo
Once you have the E4T-Diffusion installed, explore the models available in the model zoo. Here are the specifics:
- e4t-diffusion-ffhq-celebahq-v1: A pre-trained model for faces trained on FFHQ+CelebA-HQ. It incorporates Stable unCLIP for improved results.
Pre-training
Before you can start domain tuning, you need to pre-train your model based on your target image. Use the following command to start pre-training:
accelerate launch pretrain_e4t.py --pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4 --clip_model_name_or_path=ViT-H-14::laion2b_s32b_b79k --domain_class_token=art --placeholder_token=*s --prompt_template=art --save_sample_prompt=a photo of the *s,a photo of the *s in monet style --reg_lambda=0.01 --domain_embed_scale=0.1 --output_dir=pretrained-wikiart --train_image_dataset=Artificio/WikiArt --iterable_dataset --resolution=512 --train_batch_size=16 --learning_rate=1e-6 --scale_lr --checkpointing_steps=10000 --log_steps=1000 --max_train_steps=100000 --unfreeze_clip_vision --mixed_precision=fp16 --enable_xformers_memory_efficient_attention
It’s like painting a canvas with a particular color palette—here, you’re teaching your model about the domain of the images you want to use!
Domain Tuning
Once your model is pre-trained, you’re ready for domain tuning! It’s as if you’re finalizing the fine details of your painting to make it perfect.
accelerate launch tuning_e4t.py --pretrained_model_name_or_path=path/to/e4t-pretrained/model --prompt_template=a photo of placeholder_token --reg_lambda=0.1 --output_dir=path-to-save-model --train_image_path=image_path_or_url --resolution=512 --train_batch_size=16 --learning_rate=1e-6 --scale_lr --max_train_steps=30 --mixed_precision=fp16 --enable_xformers_memory_efficient_attention
Inference
Once domain tuning is complete, you can perform inference using your model. Here’s how:
python inference.py --pretrained_model_name_or_path=path/to/e4t-pretrained/model --prompt="Times square in the style of *s" --num_images_per_prompt=3 --scheduler_type=ddim --image_path_or_url=image_path_or_url --num_inference_steps=50 --guidance_scale=7.5
This step allows you to generate images that blend your style with the input prompts, similar to bringing your painting to life!
Troubleshooting
If you encounter any issues or need help during the installation or execution processes, consider these troubleshooting tips:
- Check if all necessary libraries are correctly installed by verifying your
requirements.txtfile. - If the model doesn’t generate the expected results, ensure your dataset is appropriate for the desired outcome.
- For issues related to performance or memory, consider adjusting the batch size or the mixed precision settings.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

