How to Build and Deploy AI Model Inference APIs with BentoML

Feb 23, 2022 | Data Science

homemayankDocumentsarticle-generation-using-llmresized_images_gitdeep_learningreadme_bentoml_BentoML

Creating APIs for your AI models has never been easier with BentoML, a Python library designed to help you build online serving systems optimized specifically for model inference. This guide will provide you with the steps to get started, accompanied by some troubleshooting tips to guide you in case things don’t go as planned.

What is BentoML?

BentoML is a powerful tool that simplifies the process of creating and deploying machine learning applications. You can easily build APIs for any AIML model, manage environments, and maximize CPU/GPU utilization with built-in optimization features.

Getting Started with BentoML

Follow these steps to kick off your BentoML journey:

Step 1: Install BentoML

Make sure you have Python version 3.9 or higher installed.
Run the following command in your terminal:

pip install -U bentoml

Step 2: Define APIs

Create a file named service.py and define your API as follows:

from __future__ import annotations
import bentoml

@bentoml.service(resources=cpu: 4)
class Summarization:
    def __init__(self) -> None:
        import torch
        from transformers import pipeline
        device = "cuda" if torch.cuda.is_available() else "cpu"
        self.pipeline = pipeline("summarization", device=device)

    @bentoml.api(batchable=True)
    def summarize(self, texts: list[str]) -> list[str]:
        results = self.pipeline(texts)
        return [item["summary_text"] for item in results]

Step 3: Run the Service Locally

You will need to install additional dependencies for local running using:

pip install torch transformers

Then, run your service code locally.

bentoml serve service.py:Summarization

Your service will be running at http://localhost:3000.

Step 4: Deploying Your Bento

To deploy your BentoML service, you need to create a bentofile.yaml file:

yaml
service: 
  service: Summarization # Entry service import path
include:  
  - *.py # Include all .py files in the current directory
python:  
  packages: # Python dependencies to include    
    - torch    
    - transformers
docker:  
  python_version: 3.11

Building and Running Docker Container

Run the following command to package your service into a Bento:

bentoml build

Make sure Docker is running and then create a Docker container image:

bentoml containerize summarization:latest

Finally, run the generated image:

docker run --rm -p 3000:3000 summarization:latest

Use Cases

BentoML can be applied to various areas such as:

Language Models: Llama 3.1, Mixtral
Image Generation: Stable Diffusion 3 Medium
Text Embeddings: SentenceTransformers

Troubleshooting and Tips

If you encounter issues while following these steps, here are some troubleshooting ideas:

Ensure that your Python version is properly configured and compatible.
Check for internet connectivity when installing packages and dependencies.
Verify that Docker is running before attempting to build or run images.
If you have any specific errors, consider searching the GitHub Issues page for similar problems.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox