How to Serve Deep Learning Models with Multi Model Server

May 6, 2022 | Educational

Deep learning is transforming industries with its remarkable ability to automate and improve decision-making processes. But how do we serve these powerful models? Enter the Multi Model Server (MMS), a flexible and easy-to-use tool designed to serve deep learning models trained using various MLDL frameworks. In this article, we’ll walk you through the steps to set up the MMS and tackle some common troubleshooting ideas.

Quick Start

Before diving into the installation, ensure you have the following prerequisites:

  • Operating System: Ubuntu, CentOS, or macOS. (Windows support is experimental)
  • Python installed (Python 2.7 or 3.6)
  • Pip – the Python package management system
  • Java 8

Now, let’s jump into the installation!

Installing Multi Model Server with pip

Step 1: Set Up a Virtual Environment

It’s best practice to run Multi Model Server in a virtual environment. This makes managing dependencies much easier. Let’s create a virtual environment:

pip install virtualenv
virtualenv -p /usr/local/bin/python2.7 tmppyenv2
source tmppyenv2/bin/activate

Step 2: Install MXNet

If you don’t have the MXNet engine installed, you can install one of the recommended packages:

pip install mxnet-mkl

Step 3: Install or Upgrade MMS

pip install multi-model-server

Serving a Model

With MMS installed, starting up a model server is a breeze. You can enable the model server with just a single command:

multi-model-server --start --models squeezenet=https://s3.amazonaws.com/model-server/model_archive_1.0/squeezenet_v1.1.mar

This command starts the server and begins listening for inference requests with the specified model. Remember, if you are using powerful hardware, it may take some time to scale the backend workers based on available resources.

Testing the Model Server

To see the model in action, you can download an image (like a cute kitten) and make a prediction with the following commands:

curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
curl -X POST http://127.0.0.1:8080/predictions/squeezenet -T kitten.jpg

Upon making this request, you’ll receive a JSON response with the predicted class and its probability.

Stopping the Server

When you’re done testing, you can stop the model server with the following command:

multi-model-server --stop

Troubleshooting

If you encounter any issues while setting up or using the Multi Model Server, consider the following troubleshooting steps:

  • Ensure that the correct version of Python is installed and activated within your virtual environment.
  • Check that Java 8 is correctly installed and configured.
  • If the model does not start, verify that the model file URLs specified in the start command are correct and accessible.
  • Monitor server logs for any runtime errors related to worker scaling or model inference.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

In just a few steps, you can serve your deep learning models with the Multi Model Server, enabling fast and efficient inference capabilities. With the scalability of MMS, you’re ready to tackle even more complex applications.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Explore Further

For additional features, configurations, or to dive deeper into model packaging, check the latest version docs and explore the rich ecosystem that MMS offers.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox