Overview
After the Poolside platform is initialized, deploy the AI models by installing theinference-stack chart provided in the bundle. The chart deploys both inference models, the Envoy proxy, and the external processor.
Prerequisites
-
Images: Ensure that the inference image such as
atlas, the Envoy proxy imageenvoyproxy/envoy, and the Envoy gateway imageenvoyproxy/gatewayare available in your registry. -
S3 credentials: Ensure that a secret with AWS credentials such as
aws-credentialsexists in thepoolside-modelsnamespace. - Model checkpoints: Upload checkpoints to the S3 bucket before you install the inference chart. See Install on OpenShift → Step 4: Upload model checkpoints.
-
API key authentication (optional): To require an API key for the vLLM inference servers, create a secret containing the key in
poolside-models:
Configure the inference values file
Editinference_values.yaml (created during the platform install) and set the fields that apply to your environment:
tags.proxy: false.
For the INT4 Malibu model on 2x RTX 6000 Ada Pro GPUs with 48 GB each, limit the context length and batch size:
emptyDir volume is wiped on each restart:
Install the inference stack
Register the model
After the inference server is running, register each model in the Poolside Console. For step-by-step instructions, see Connect a model. Use the following deployment-specific values when you fill out the form:-
Model Name—the served inference model name. Retrieve it with:
-
Base URL—the in-cluster inference service endpoint. With the Envoy proxy stack, use the Envoy service:
http://inference-envoy-internal.poolside-models.svc.cluster.local/v0/models/inference-laguna/v1http://inference-envoy-internal.poolside-models.svc.cluster.local/v0/models/inference-point/v1
http://inference-laguna.poolside-models.svc.cluster.local/v1http://inference-point.poolside-models.svc.cluster.local/v1
/v1suffix is required because the platform appends/chat/completionsor/completionsto the base URL, and vLLM serves these under the/v1/prefix.
Authorization: Bearer <vllm-api-key> custom header to the model so the platform sends the Authorization header with inference requests.