Stable Diffusion is a powerful model for generating images from text prompts, and deploying it effectively is crucial for developers and researchers alike. In this guide, we’ll explore various ways to serve Stable Diffusion, focusing on its implementation through keras-cv and how to set it up on different platforms.
Deployment Methods Overview
This repository covers multiple approaches to deploying Stable Diffusion, including:
- All in One Endpoint
- Three Separate Endpoints
- One Endpoint with Two Local APIs
- On-Device Deployment
1. All in One Endpoint
This method allows you to deploy Stable Diffusion as a single endpoint encapsulating all components (encoder, diffusion model, and decoder). Think of this as a pre-packaged meal where all the ingredients are combined to deliver a complete dish in one serving.
Here’s how to set it up:
- Hugging Face Endpoint: You will need to create a custom handler that simplifies the deployment process.
- FastAPI Endpoint: Easy integration using FastAPI. For resources, refer to the standalone codebase.
- Docker Image: Utilize a pre-built Docker image for deployment.

2. Three Separate Endpoints
In this scenario, we break down Stable Diffusion into three individual endpoints. This means you can customize each endpoint, akin to ordering individual dishes instead of a combo meal.
To achieve this setup:
- Use the provided notebook to split components.
- Check the Hugging Face and FastAPI resources for each part:
- Text Encoder, Diffusion Model, and Decoder
- Consider using Docker images for easier management.
- Tackle TF Serving separately based on the saved models.

3. One Endpoint with Two Local APIs
Here, specific parts of Stable Diffusion can run locally while the diffusion model operates in the cloud. This setup allows flexibility to swap out models easily, similar to choosing a different side dish while keeping your main course intact.
Examples include:
- Combine Hugging Face endpoints with local Python clients or web/mobile integrations.
4. On-Device Deployment (with TFLite)
This method focuses on lessening resource consumption by hosting the models on-device. Using TFLite allows models to run smoothly on hardware with limited capabilities—like having a mini version of your favorite dish you can prepare at home.
Explore the available TFLite models:
Troubleshooting
If you encounter issues during deployment, consider the following:
- Check the size limit of your payload, as different platforms have varying restrictions. For instance, Vertex AI allows a maximum request size of 1.5MB.
- Ensure that you’re using the correct versions of models (consider version compatibility).
- Explore using different Docker images to manage variations in input and output formats.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Final Thoughts
This guide provides users with multiple strategies to implement and deploy Stable Diffusion effectively. Leveraging notebooks, Docker, and various web frameworks, developers can choose their preferred method tailored to the specific application needs.

