Welcome to the wonderful world of LLMs (Large Language Models) with the LLM-API project! If you’re a developer, researcher, or just a curious enthusiast, this blog will guide you step-by-step in setting up the LLM-API to harness the capabilities of these powerful models easily.
Getting Started with LLM-API
The LLM-API allows you to seamlessly run various LLMs on different hardware configurations. You can opt to run your models in Docker containers or directly on your local machine—whatever suits your needs best!
Step 1: Create a Configuration File
The first step involves creating a configuration file called config.yaml. This file will include the settings required to run the desired model. Here’s an analogy: think of this file as a recipe that guides how to prepare a dish! In your case, this “dish” is the setup for running the chosen LLM.
- models_dir: models
- model_family: llama (or gptq_llama or huggingface)
- setup_params: key-value pairs specific to your model
- model_params: additional model-specific configurations
Feel free to modify any configurations using environment variables prefixed with LLM_API_, for example: LLM_API_MODELS_DIR=models2.
Step 2: Run LLM-API Using Docker
Next, you need to execute your configuration in Docker. Open your terminal and run the following command:
docker run -v $PWD/models:models:rw -v $PWD/config.yaml:llm-api/config.yaml:ro -p 8000:8000 --ulimit memlock=16000000000 1b5dllm-api
This command launches a Docker container, mounts your model directory, and sets up API access on port 8000.
Alternatively, Use Docker Compose
You can also use Docker Compose to simplify the process. Just run:
docker compose up
On the first run, LLM-API will download the specified model from Hugging Face based on your setup parameters, and store it for future use. Think of this initial run as preheating your kitchen before cooking!
Exploring Endpoints
After setting up LLM-API, you will interact with your model through standardized endpoints. Here are the primary ones:
- Generate Text –
POST generate - Request Example:
prompt: What is the capital of France? - Description: Generates text based on a given prompt with customization options.
- Async Text Generation –
POST agenerate - Request Example:
prompt: What is the capital of France? - Description: Initiates text generation tasks in the background.
- Text Embeddings –
POST embeddings - Request Example:
text: What is the capital of France? - Description: Obtains embeddings for text for various NLP tasks.
Troubleshooting
As with any software project, you might encounter some bumps in the road. Here are some tips to troubleshoot common issues:
- Ensure Docker is installed and functioning properly.
- Double-check your config.yaml file for any typos or incorrect paths.
- Verify that the necessary models are correctly specified in your configuration.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

