How to Easily Self-Host LLMs with OpenLLM

Dec 25, 2022 | Educational

homemayankDocumentsarticle-generation-using-llmresized_images_gitreadme_bentoml_OpenLLM

Are you ready to leap into the world of self-hosting language models? OpenLLM is here to transform this complex task into a seamless experience. This guide will lead you through the process of running any open-source LLMs, enabling you to create OpenAI-compatible APIs with just a single command. Let’s dive in!

Getting Started

Follow these simple commands to install OpenLLM and explore it interactively:

pip install openllm  # or pip3 install openllm
openllm hello!

Supported Models

OpenLLM supports a wealth of incredibly powerful open-source LLMs, like Llama 3.1, Qwen2, and Phi3. Additionally, you can add a model repository to run custom models. Here’s a quick glance at some of the models you can use:

Llama 3.1 (8B): No quantization required, needs a 24G GPU.
Mistral (7B): No quantization required, needs a 24G GPU.
Qwen2 (1.5B): No quantization required, needs a 12G GPU.
Phi3 (3.8B): No quantization required, needs a 12G GPU.

For a complete list of models, check out the OpenLLM models repository.

Starting Your LLM Server

Once you have your model set, it’s time to launch your local LLM server. You can do this by using the openllm serve command followed by the model version:

openllm serve llama3:8b

Your server will be available at: http://localhost:3000, providing you with OpenAI-compatible APIs for interaction. Accessing it is just like visiting a café with your favorite brew: casual and ready for conversation!

Interacting with Your LLM

To interact with the LLM, you’ll typically need to specify:

API Host Address: Default is http://localhost:3000.
Model Name: Different based on the tool you use.
API Key: While optional, it’s used for client authentication.

Here’s an example of using the OpenAI Python client:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3000/v1", api_key="na")
chat_completion = client.chat.completions.create(
    model="meta-llamaMeta-Llama-3-8B-Instruct",
    messages=[{"role": "user", "content": "Explain superconductors like I'm five years old"}],
    stream=True,
)

for chunk in chat_completion:
    print(chunk.choices[0].delta.content or '', end='')

The Chat UI

OpenLLM even provides a chat UI at the chat endpoint for your LLM server at http://localhost:3000/chat. It’s like having a social gathering where your LLM is the star of the show!

Setting Up Custom Models and Repositories

You can enrich your experience by adding your own custom models. Here’s how:

Create a Bentos directory to store the custom LLMs by following the BentoML guidelines.
Register your custom model repository with OpenLLM using:

openllm repo add repo-name repo-url

Remember, OpenLLM currently only supports public repositories.

Deploying to BentoCloud

If you want to take your deployment to the cloud, you can leverage BentoCloud. Its auto-scaling and orchestration features can make your life way easier. Run this to deploy your model:

openllm deploy llama3:8b

Troubleshooting

If you encounter any issues while setting up or running your server, consider the following troubleshooting tips:

Make sure your GPU meets the requirements specified for the model you are trying to use.
Check the API host address and ensure that your server is running.
For dependency or installation issues, ensure that your package manager is up to date.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By following these steps, you can successfully bring powerful language models into your development ecosystem with OpenLLM. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox