In this guide, we’ll walk you through the process of using the Lumina-Next-SFT model for generating images from textual prompts. This innovative AI model is a powerhouse that uses a Next-DiT architecture with 2 billion parameters and the Gemma-2B text encoder to create detailed and artistic visuals from your descriptions. Let’s dive in!
What You’ll Need
- Python (version 3.11)
- PyTorch (version 2.1.0)
- Diffusers library
- Flash Attention
- Access to a CUDA-enabled GPU for best performance
Installation Steps
Follow these steps to install the necessary environment and libraries:
1. Create a Conda Environment
First, you need to set up a new conda environment for your project. This helps you manage dependencies effectively.
conda create -n Lumina_T2X -y
conda activate Lumina_T2X
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
2. Install Required Dependencies
Next, install the required libraries for Lumina-Next-SFT to function properly.
pip install diffusers huggingface_hub
3. Install Flash Attention
Flash attention speeds up the attention mechanism in the model. Install it by using the following command:
pip install flash-attn --no-build-isolation
Running Inference
Now that everything is installed, let’s dive into generating some images!
1. Prepare the Pre-trained Model
First, you’ll want to download the pre-trained model. It’s recommended to use the huggingface_cli for optimal results:
huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-SFT-diffusers --local-dir /path/to/ckpt
2. Generate Images with Demo Code
Run the following Python code to use your model:
from diffusers import LuminaText2ImgPipeline
import torch
pipeline = LuminaText2ImgPipeline.from_pretrained("/path/to/ckpt/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")
# or you can download the model using code directly
# pipeline = LuminaText2ImgPipeline.from_pretrained("Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")
image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]
Understanding the Code: An Analogy
Think of the process of generating images from text prompts like a chef following a recipe:
- The chef is the computer program that executes the commands.
- The recipe is the model (Lumina-Next-SFT) that defines how to combine ingredients (text prompts) to create a dish (generated image).
- The ingredients are the parameters and data that the model uses to understand and create the image.
- Once the chef follows the recipe with precise measurements, a delicious dish (the final image) is created!
Troubleshooting Issues
If you encounter any issues during the installation or while running the model, consider the following tips:
- CUDA Errors: Ensure your CUDA version matches your NVIDIA driver. Install the right version using the link provided in the installation step.
- Missing Dependencies: Double-check that all dependencies are properly installed, especially the flash-attn library.
- Model Loading Errors: Verify the path to your downloaded model is correct and that the download was successful.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

