Overview
After you initialize the Poolside platform, you can serve Poolside models from GPU workloads inside your Amazon EKS cluster, or connect the platform to external OpenAI-compatible API endpoints. Most production deployments serve models locally for performance and data-locality reasons. Local inference uses the bundledinference-stack Helm chart, which deploys an Envoy proxy and one model deployment per enabled subchart. The Envoy proxy URL becomes the in-cluster base URL you register in the Poolside Console.
Deploy local Poolside models
Prerequisites
- A GPU node group in your cluster. The supported minimum instance type is
p5e.48xlarge. The reference architecture provisions this node group, the NVIDIA GPU Operator, and the supporting IAM/IRSA wiring. - An S3 bucket for model checkpoints. The reference architecture creates a
<deployment>-modelsbucket and grants the inference IRSA role read access to it. - For self-assembled deployments, choose any bucket name and follow the layout described in the model checkpoints guide.
- The Poolside deployment bundle, extracted on the workstation that runs
helm.
Stage model checkpoints in S3
Each enabled inference subchart loads its weights from S3 at startup. The reference architecture lays checkpoints out under:<deployment>-models by default; for self-assembled deployments, use whatever bucket you provisioned. For the upload mechanics, including the streaming uploader and the bring-your-own-bucket alternative, see the model checkpoints guide in the reference architecture repository.
The inference IRSA role must have
s3:GetObject and s3:ListBucket on the models bucket. The reference architecture scopes this policy automatically.Install the inference-stack chart
The bundle ships aninference-stack chart under charts/inference-stack/. Each model is a subchart (for example, inference-malibu, inference-point) and is enabled per-deployment in your values file.
Refer to the chart’s values.yaml and the reference architecture customization guide for the full set of supported overrides, including per-model GPU counts and model identifiers. The reference architecture’s poolside-values module composes these values automatically from the Terraform outputs.
Register the model in the Poolside Console
Whether you serve models locally or connect to an external API, you register them in the Poolside Console. For the full procedure, see Connect a model. Use the following deployment-specific values:- Base URL: The model API endpoint.
-
For locally hosted Poolside models, use the in-cluster Envoy proxy service URL, for example:
- For external API endpoints, use the provider’s public URL.
-
For locally hosted Poolside models, use the in-cluster Envoy proxy service URL, for example:
Connect external OpenAI-compatible models
If you do not run local inference, you can point the platform at any OpenAI-compatible model API. Skip theinference-stack install and register the external endpoint as the model’s base URL when you connect a model.