Documentation Index
Fetch the complete documentation index at: https://docs.poolside.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
After you initialize the Poolside platform, you can serve Poolside models from GPU workloads inside your Amazon EKS cluster, or connect the platform to external OpenAI-compatible API endpoints. Most production deployments serve models locally for performance and data-locality reasons. Local inference uses the bundledinference-stack Helm chart, which deploys an Envoy proxy and one model deployment per enabled subchart. The Envoy proxy URL becomes the in-cluster base URL you register in the Poolside Console.
Prerequisites
- GPU node group: Use a GPU node group in your cluster, with
p5e.48xlargeas the supported minimum instance type. The reference architecture provisions this node group and the supporting IAM/IRSA wiring. - GPU software stack: Install NVIDIA GPU Operator 26.3.0 in the cluster, with the following component versions:
- NVIDIA driver 580.126.20
- NVIDIA Container Toolkit 1.19.0
- Model checkpoints bucket: Provide an S3 bucket for model checkpoints. The reference architecture creates a
<deployment>-modelsbucket and grants the inference IRSA role read access to it.- For self-assembled deployments, choose any bucket name and follow the layout described in the model checkpoints guide.
- Deployment bundle: Extract the Poolside deployment bundle on the workstation that runs
helm.
Stage model checkpoints in S3
Each enabled inference subchart loads its weights from S3 at startup. The reference architecture lays checkpoints out under:<deployment>-models by default; for self-assembled deployments, use whatever bucket you provisioned. For the upload mechanics, including the streaming uploader and the bring-your-own-bucket alternative, see the model checkpoints guide in the reference architecture repository.
The inference IRSA role must have
s3:GetObject and s3:ListBucket on the models bucket. The reference architecture scopes this policy automatically.Install the inference-stack chart
The bundle ships aninference-stack chart under charts/inference-stack/. Each model is a subchart (for example, inference-laguna, inference-point) and is enabled per-deployment in your values file.
Refer to the chart’s values.yaml and the reference architecture customization guide for the full set of supported overrides, including per-model GPU counts and model identifiers. The reference architecture’s poolside-values module composes these values automatically from the Terraform outputs.
Register the model
Whether you serve models locally or connect to an external API, you register them in the Poolside Console. For the full procedure, see Connect a model. Use the following deployment-specific values:- Base URL: The model API endpoint.
-
For locally hosted Poolside models, use the in-cluster Envoy proxy service URL, for example:
- For external API endpoints, use the provider’s public URL.
-
For locally hosted Poolside models, use the in-cluster Envoy proxy service URL, for example:
Connect external OpenAI-compatible models
If you do not run local inference, you can point the platform at any OpenAI-compatible model API. Skip theinference-stack install and register the external endpoint as the model’s base URL when you connect a model.