Skip to main content

Overview

This guide describes how to install the Poolside platform on Amazon Web Services (AWS) using either Amazon Bedrock or Amazon Elastic Kubernetes Service (EKS) with provisioned GPU nodes for inference. The installation process is organized into the following phases:
  1. Create the base infrastructure: Create a VPC (or use an existing one), an EKS cluster, an RDS database, and optional ECR repositories, IAM roles, and KMS keys.
  2. Provision node groups and add-ons: Create CPU and GPU node groups and install EKS add-ons (AWS CNI, EBS CSI).
  3. Upload container images and checkpoints: Upload Poolside and third-party container images to your ECR registries.
  4. Deploy the application: Deploy Poolside workloads and configure ingress.
  5. Complete post-deployment setup: Configure DNS, OIDC, and model loading.
After you complete the deployment, it is ready for use and aligned with the best practices described in the Poolside reference architecture.
In standard AWS Commercial accounts, IAM/OIDC trust for EKS IRSA is handled by Terraform automatically (no manual OIDC steps required).

Deployment bundle

The bundle contains Terraform containers, example variable files, and wrapper scripts:
|- aws
|- binaries
|- c2e
|- containers -- Terraform containers for each deployment step
|  |- poolside-aws-iam---<tag>
|  |- poolside-container-uploader-full---<tag>
|  |- poolside-self-managed-deployment---<tag>
|  |- poolside-self-managed-infra-phase-1---<tag>
|  |_ poolside-self-managed-infra-phase-2---<tag>
|- installation_steps
   |_ aws -- Example variable files and wrapper scripts
      |- 0_aws_iam -- Used for the optional manual IAM path
      |  |_ terraform.tfvars
      |- 1_infra_phase_1
      |  |- remote.tf
      |  |- run_terraform.sh
      |  |_ terraform.tfvars
      |- 2_infra_phase_2
      |  |- remote.tf      
      |  |- run_terraform.sh
      |  |_ terraform.tfvars
      |- 2.5_container_upload
      |  |_ run.sh
      |- 3_deployment
      |  |- remote.tf      
      |  |- run_terraform.sh
      |  |_ terraform.tfvars
|_ terraform
The installation_steps/aws/* directories include example terraform.tfvars files and wrapper scripts (run_terraform.sh or run.sh) for each step. Container directories are formatted as <container_name>---<container_tag> so you can infer the container name and tags from the directory. These directories include multi-architecture images for amd64 and arm64.

Prerequisites

  • A bastion host or local machine that can access the AWS environment.
  • Tools installed on the host:
  • An AWS account with permissions to create VPC, EKS, RDS, IAM roles and policies, and ECR.
  • If you are using Amazon Bedrock for inference, your AWS account must be allow-listed to access Poolside models (contact Poolside).
  • A DNS hosted zone for the deployment hostname.
  • An OCI-compatible container registry (Amazon ECR recommended). Terraform creates additional ECR repositories automatically in Step 1: Create the base infrastructure.
  • Optional: A Terraform remote state S3 bucket with versioning. If you use the S3 backend and multiple users or locations run Terraform, enable DynamoDB state locking. If you run Terraform from a single host, you can use local state instead.
  • If you are providing your own VPC, ensure it meets the requirements in the AWS VPC deployment guide.
If you use aws-load-balancer-controller (ALB), you must provide an ACM certificate for your deployment domain in the same AWS region as the cluster. If you use ingress-nginx (private or air-gapped), TLS terminates inside the cluster and Terraform expects Base64-encoded certificate and key values to create the Kubernetes TLS secret. If you need to import or request an ACM certificate, see:

Environment configuration

Set the following shell variables before starting. The bash commands throughout this guide reference these variables so you can copy and paste them directly. Provided by you:
export DEPLOYMENT_NAME=""        # Unique name (for example, production, dev-team-name). Used in DNS, S3, and resource naming.
export AWS_REGION=""             # AWS region (for example, us-east-1, us-east-2)
export AWS_PROFILE=""            # AWS CLI profile (for example, default, work-profile)
export ACCOUNT_ID=""             # AWS account ID (for example, 123456789012)
export CONTAINER_REGISTRY_URI="" # Registry URI (for example, 123456789012.dkr.ecr.us-east-2.amazonaws.com)
export DOMAIN_NAME=""            # DNS name for your deployment (for example, poolside.yourcompany.com)
export KEY_FILE=""               # SSH private key if using a bastion (for example, ~/.ssh/id_ed25519)
Optional: Set these if they apply to your environment:
export TERRAFORM_STATE_BUCKET_NAME=""       # S3 bucket for Terraform remote state
export TERRAFORM_STATE_LOCK_TABLE_NAME=""   # DynamoDB table for Terraform state locking
Provided by Poolside (contact Poolside if you do not have these):
export POOLSIDE_SHARED_BUCKET=""            # S3 bucket where Poolside release assets are stored
export POOLSIDE_RELEASE_KEY=""              # S3 key/prefix of the release
export POOLSIDE_RELEASE_VERSION=""          # Release version string
export POOLSIDE_VERSION_TAG=""              # Version tag for Terraform containers

Preparation

Step A: Download the deployment bundle

Download the Poolside bundle from the provided S3 location:
aws s3 cp \
  s3://$POOLSIDE_SHARED_BUCKET/bundles/$POOLSIDE_RELEASE_VERSION \
  ./poolside-bundle --recursive --region us-east-2
Set BUNDLE_DIR to the bundle root (the directory that contains containers/ and installation_steps/):
export BUNDLE_DIR=./poolside-bundle/<release-directory>
Change into the bundle root before running the remaining steps:
cd $BUNDLE_DIR

Step B (optional): Create container registry repositories

If you do not already have a container registry, create repositories in Amazon ECR for the deployment containers:
aws ecr create-repository --repository-name poolsideai/poolside-self-managed-infra-phase-1 --region $AWS_REGION
aws ecr create-repository --repository-name poolsideai/poolside-self-managed-infra-phase-2 --region $AWS_REGION
aws ecr create-repository --repository-name poolsideai/poolside-self-managed-deployment --region $AWS_REGION
aws ecr create-repository --repository-name poolsideai/poolside-container-uploader-full --region $AWS_REGION
Optional: If you are using manual IAM, create a repository for importing the IAM container image:
aws ecr create-repository --repository-name poolsideai/poolside-aws-iam --region $AWS_REGION
If you are using a different container registry, follow your internal guidance to create repositories.

Step C: Sync Terraform containers to your registry

  1. Authenticate to your container registry before syncing the Terraform containers:
    podman login $CONTAINER_REGISTRY_URI
    
  2. Import the Terraform container images from the bundle into your registry using skopeo. Use the dir: transport to reference the extracted bundle directories.
    This step syncs only the deployment container images. Application container images are synced in Step 2.5: Upload container images and model checkpoints.
    There are four deployment container images to synchronize:
    • poolside-self-managed-infra-phase-1: Creates VPC, EKS control plane, RDS, and ECR.
    • poolside-self-managed-infra-phase-2: Creates CPU/GPU node groups and installs EKS add-ons.
    • poolside-self-managed-deployment: Deploys the Poolside application.
    • poolside-container-uploader-full: Contains and pushes all required images for the deployment.
    Copy each container image to your registry:
    If your registry does not support multi-architecture images, copy a single architecture and omit the --multi-arch all flag.
    # Sync infra-phase-1 container image
    skopeo copy --multi-arch all \
      dir:$BUNDLE_DIR/containers/poolside-self-managed-infra-phase-1---$POOLSIDE_VERSION_TAG \
      docker://$CONTAINER_REGISTRY_URI/poolsideai/poolside-self-managed-infra-phase-1:$POOLSIDE_VERSION_TAG
    
    # Sync infra-phase-2 container image
    skopeo copy --multi-arch all \
      dir:$BUNDLE_DIR/containers/poolside-self-managed-infra-phase-2---$POOLSIDE_VERSION_TAG \
      docker://$CONTAINER_REGISTRY_URI/poolsideai/poolside-self-managed-infra-phase-2:$POOLSIDE_VERSION_TAG
    
    # Sync deployment container image
    skopeo copy --multi-arch all \
      dir:$BUNDLE_DIR/containers/poolside-self-managed-deployment---$POOLSIDE_VERSION_TAG \
      docker://$CONTAINER_REGISTRY_URI/poolsideai/poolside-self-managed-deployment:$POOLSIDE_VERSION_TAG
    
    # Sync 'container uploader' image (used in the container upload step)
    skopeo copy --multi-arch all \
      dir:$BUNDLE_DIR/containers/poolside-container-uploader-full---$POOLSIDE_VERSION_TAG \
      docker://$CONTAINER_REGISTRY_URI/poolsideai/poolside-container-uploader-full:$POOLSIDE_VERSION_TAG
    
    For more details on the skopeo copy command, see the skopeo-copy documentation.
  3. Optional: If you are using manual IAM, also import poolside-aws-iam container image into your registry.
    # Sync poolside-aws-iam container image
    skopeo copy --multi-arch all \
      dir:$BUNDLE_DIR/containers/poolside-aws-iam---$POOLSIDE_VERSION_TAG \
      docker://$CONTAINER_REGISTRY_URI/poolsideai/poolside-aws-iam:$POOLSIDE_VERSION_TAG
    

Step D: Configure remote state

Each installation step must have its own unique Terraform state file. All steps can share the same S3 bucket and DynamoDB lock table, but each must use a distinct key value in remote.tf to isolate state file. Example remote.tf for Step 1: Create the base infrastructure:
terraform {
  backend "s3" {
    bucket         = "<TERRAFORM_STATE_BUCKET_NAME>"
    key            = "<DEPLOYMENT_NAME>/infra-phase-1.tfstate"
    region         = "<AWS_REGION>"
    encrypt        = true
    dynamodb_table = "<TERRAFORM_STATE_LOCK_TABLE_NAME>"
  }
}
For step 2 and step 3, copy this remote.tf file to the 2_infra_phase_2 and 3_deployment directories and update the key to a unique value (for example, infra-phase-2.tfstate, deploy.tfstate).
Do not reuse the same key across steps. Each step manages different AWS resources and must remain isolated.

Step E: Set AWS credentials

Each Terraform container requires AWS credentials to deploy resources. You can pass these credentials in several ways depending on your organization’s security model:
  1. Environment variables (recommended for most users):
    export AWS_ACCESS_KEY_ID=<your-access-key>
    export AWS_SECRET_ACCESS_KEY=<your-secret-key>
    export AWS_SESSION_TOKEN=<your-session-token>
    
  2. Shared credentials volume (default for AWS CLI users): Mount your ~/.aws directory into the container:
    -v ~/.aws:/poolside/.aws -e AWS_PROFILE=$AWS_PROFILE
    
  3. Alternative authentication: If using IAM roles, SSO, or STS federation, follow your internal process for credential injection. Poolside supports all standard AWS authentication mechanisms.

Step F (optional): Manage IAM roles and policies

This section applies only if you are not using the recommended default installation approach, which creates IAM roles and policies automatically. However, if you need to manage IAM roles externally, you can use the optional poolside-aws-iam container and the example file in installation_steps/aws/0_aws_iam/terraform.tfvars. You will also need to apply OIDC trust relationships after Step 1: Create the base infrastructure. A copy of the Terraform files is available in the bundle at terraform/terraform-poolside-aws-iam if you need to create the roles through your own process. If you choose the external IAM path, update the infra terraform.tfvars files to disable IAM creation and provide the role ARNs your deployment should use.
  1. Configure installation_steps/aws/0_aws_iam/terraform.tfvars:
    deployment_name = "<DEPLOYMENT_NAME>"
    region          = "<AWS_REGION>"
    
    kms_keys = {
      eks_key_arn       = "arn:aws:kms:us-east-2:xxxxxxxxxxxx:key/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
      s3_bucket_key_arn = "arn:aws:kms:us-east-2:xxxxxxxxxxxx:key/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
      poolside_key_arn  = "arn:aws:kms:us-east-2:xxxxxxxxxxxx:key/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    }
    
    # These values are known after the base infrastructure apply from the EKS creation. Once populated,
    # re-run this module to complete the OIDC trust relationships on the IAM roles.
    #
    # eks_oidc = {
    #   oidc_provider_arn = "arn:aws:iam::xxx"
    #   oidc_provider     = "oidc.eks.xxx"
    # }
    
    The deployment_name, region, and kms_keys values must align with your chosen deployment settings. The KMS keys must already exist.
  2. Run the poolside-aws-iam container:
    docker run -it \
      -v ${PWD}/remote.tf:/poolside/aws_iam/remote.tf \
      -v ${PWD}/terraform.tfvars:/poolside/aws_iam/terraform.tfvars \
      $CONTAINER_REGISTRY_URI/poolsideai/poolside-aws-iam:$POOLSIDE_VERSION_TAG
    
  3. Inside the container, run Terraform commands:
    terraform init
    terraform plan
    terraform apply
    
  4. After you complete Step 1, create the EKS OIDC provider using the irsa_requirements.eks_oidc_url output and set the audience to sts.amazonaws.com. Then apply IRSA trust relationships to the IAM roles listed in irsa_requirements.role_arns. See Step 1.1: OIDC and IRSA trust relationships below.

Installation

Step 1: Create the base infrastructure

Create the VPC (or use an existing one), EKS control plane, RDS, and optional ECR repositories. This step typically takes 15 to 30 minutes.
If you have chosen to create the IAM roles externally, they must be created before running this step.
  1. Update installation_steps/aws/1_infra_phase_1/terraform.tfvars:
    deployment_name = "<your-deployment-name>"
    region          = "<your-region>"
    
    # Provide a compatible AWS VPC, or alternatively create one
    aws_vpc = {
      # Use this if you want to use an existing compatible VPC
      # provide = {
      #   vpc_id                   = "vpc-xxx"
      #   private_subnet_ids       = ["subnet-xxx", "subnet-yyy"]
      #   control_plane_subnet_ids = ["subnet-xxx", "subnet-yyy"]
      #   #public_subnet_ids       = ["subnet-xxx", "subnet-yyy"] # optional
      # }
    
      # Use this configuration to have the deployment create a VPC for you
      create = {
        vpc_cidr           = "10.10.0.0/21"
        availability_zones = ["a", "b"]
        use_nat_gateway    = true
      }
    }
    
    # Choose your inference path (uncomment one):
    # Bedrock:
    # inference_config = {
    #   enable_bedrock         = true
    #   enable_local_inference = false
    #   device_plugin_only     = true
    #   # bedrock_region = ""  # Set if models are not available in the top-level region
    # }
    # Local GPU:
    # inference_config = {
    #   enable_bedrock         = false
    #   enable_local_inference = true
    #   device_plugin_only     = true
    # }
    
    # Optional: Custom tags for all created AWS resources
    # custom_tags = {
    #   "name": "value"
    # }
    
    # Optional: Bastion host
    # bastion = {
    #   ssh_public_key = "ssh-ed25519 <your-public-key>"
    #   # ssh_key_name   = "your-key-pair-name"
    #   # access_cidrs   = ["<your-ip>/32"]
    #   # public_access  = true
    # }
    
    # Create ECR repositories for app images
    # ecr_create_repositories = true
    
    # Required for Bedrock deployments
    # use_aws_ssm_parameter_store = true
    
    # Additional admin ARNs for cluster access
    eks_cluster_additional_admin_arns         = []
    eks_cluster_additional_security_group_ids = []
    
  2. Set inference_config for your inference path:
    • Bedrock: enable_bedrock = true, enable_local_inference = false.
    • Local GPU: enable_bedrock = false, enable_local_inference = true.
  3. Run the 1_infra_phase_1 container. You can use the included wrapper script or run the container manually: Using the wrapper script:
    cd installation_steps/aws/1_infra_phase_1/
    ./run_terraform.sh
    
    The wrapper script uses podman by default. Change it to docker if needed. Replace <container_repository>/poolside-self-managed-infra-phase-1:<container_version_tag> with your full container image URL.
    Or run manually:
    podman run -it \
      -v $(pwd)/remote.tf:/poolside/infra/remote.tf \
      -v $(pwd)/terraform.tfvars:/poolside/infra/terraform.tfvars \
      -v ~/.aws:/poolside/.aws \
      -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
      -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
      -e AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN} \
      $CONTAINER_REGISTRY_URI/poolsideai/poolside-self-managed-infra-phase-1:$POOLSIDE_VERSION_TAG
    
  4. Inside the container, run Terraform commands:
    terraform init
    terraform plan
    terraform apply
    
  5. Record the following outputs for use in later steps:
    terraform output
    
    • cluster_name (for example, poolside-<your-deployment-name>)
    • models_bucket (S3 bucket, for example, poolside-<your-deployment-name>)
    • bastion_public_ip (if a bastion host was created)
  6. If you are using manual IAM, capture the irsa_requirements output and complete Step 1.1: OIDC and IRSA trust relationships after this step and before Step 2: Provision node groups and add-ons:
    irsa_requirements = {
      "eks_oidc_url" = "https://oidc.eks.us-east-2.amazonaws.com/id/AC1FAD5XXXXXXXXB5A016995E65314B1"
      "role_arns" = [
        "arn:aws:iam::xxxxxxxxxxxx:role/poolside-deploy-aws-ebs-csi",
        "arn:aws:iam::xxxxxxxxxxxx:role/poolside-deploy-cni-pod-role",
        "arn:aws:iam::xxxxxxxxxxxx:role/poolside-deploy-nginx-pod-role",
        "arn:aws:iam::xxxxxxxxxxxx:role/poolside-deploy-eks-pod-role",
      ]
    }
    
  7. Exit the container.

Step 1.1: OIDC and IRSA trust relationships

This section only applies if you are using manual IAM role management. If you are using the default IAM path, skip this section.
  1. Create the IAM OIDC provider for the EKS cluster using the irsa_requirements.eks_oidc_url output. Set the audience to sts.amazonaws.com.
  2. Note the ARN and provider of the OIDC provider you created. For example:
    • ARN: arn:aws:iam::xxxxxxxxxxxx:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/AC1FAD5XXXXXXXXB5A016995E65314B1
    • Provider: oidc.eks.us-east-2.amazonaws.com/id/AC1FAD5XXXXXXXXB5A016995E65314B1
  3. If you used the poolside-aws-iam container for manual IAM, update its terraform.tfvars with the eks_oidc values and re-run terraform apply to apply the trust relationships automatically.
  4. If you are managing IAM outside of the provided Terraform, add the following trust policy statements to each role. Replace <ARN> and <PROVIDER> with the OIDC provider ARN and provider you created above. poolside-deploy-aws-ebs-csi:
    [
      {
        "Effect": "Allow",
        "Principal": { "Federated": "<ARN>" },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
            "<PROVIDER>:aud": "sts.amazonaws.com",
            "<PROVIDER>:sub": "system:serviceaccount:poolside:core-api-irsa"
          }
        }
      },
      {
        "Effect": "Allow",
        "Principal": { "Federated": "<ARN>" },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
            "<PROVIDER>:aud": "sts.amazonaws.com",
            "<PROVIDER>:sub": "system:serviceaccount:poolside-models:inference-irsa"
          }
        }
      },
      {
        "Effect": "Allow",
        "Principal": { "Federated": "<ARN>" },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
            "<PROVIDER>:aud": "sts.amazonaws.com",
            "<PROVIDER>:sub": "system:serviceaccount:log-collector:log-collector-irsa"
          }
        }
      }
    ]
    
    poolside-deploy-cni-pod-role:
    [
      {
        "Effect": "Allow",
        "Principal": { "Federated": "<ARN>" },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
            "<PROVIDER>:sub": "system:serviceaccount:kube-system:aws-node",
            "<PROVIDER>:aud": "sts.amazonaws.com"
          }
        }
      }
    ]
    
    poolside-deploy-nginx-pod-role:
    [
      {
        "Effect": "Allow",
        "Principal": { "Federated": "<ARN>" },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
            "<PROVIDER>:sub": "system:serviceaccount:poolside-nginx-ingress:ingress-nginx",
            "<PROVIDER>:aud": "sts.amazonaws.com"
          }
        }
      }
    ]
    
    poolside-deploy-eks-pod-role:
    [
      {
        "Effect": "Allow",
        "Principal": { "Federated": "<ARN>" },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
            "<PROVIDER>:sub": "system:serviceaccount:poolside:core-api-irsa",
            "<PROVIDER>:aud": "sts.amazonaws.com"
          }
        }
      },
      {
        "Effect": "Allow",
        "Principal": { "Federated": "<ARN>" },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
            "<PROVIDER>:sub": "system:serviceaccount:poolside-models:inference-irsa",
            "<PROVIDER>:aud": "sts.amazonaws.com"
          }
        }
      },
      {
        "Effect": "Allow",
        "Principal": { "Federated": "<ARN>" },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
            "<PROVIDER>:sub": "system:serviceaccount:log-collector:log-collector-irsa",
            "<PROVIDER>:aud": "sts.amazonaws.com"
          }
        }
      }
    ]
    

Step 2: Provision node groups and add-ons

Create CPU/GPU node groups and install EKS add-ons. This step typically takes 5 to 15 minutes (mostly dependent on GPU nodes if using local inference).
  1. Update installation_steps/aws/2_infra_phase_2/terraform.tfvars:
    # Must match the deployment name from the 1_infra_phase_1 step
    deployment_name = "<your-deployment-name>"
    region          = "<your-region>"
    
    # Override the default GPU instance type (p5.48xlarge) if needed
    # p5's are H100's, p5e and p5en are H200's
    # gpu_node_group_instance_type = "p5e.48xlarge"
    
    # gpu_node_group_desired_size = 2  # default; override only if scaling differently
    
    # Optional: SSH Keys
    # cpu_node_group_ssh_key_name  = "SSH key name"
    # gpu_node_group_ssh_key_name  = "SSH key name"
    # SSH Keys must exist in AWS console under EC2 -> Key Pair
    
    # Optional: Capacity reservation for GPU resources
    # gpu_node_group_capacity_reservation = {
    #   id = "<your-capacity-reservation-id>"
    #   # or:
    #   # group_arn = "arn:aws:ec2:<region>:<account-id>:capacity-reservation-group/<group-id>"
    # }
    
  2. If using remote state, copy remote.tf from step 1 and adjust the key field:
    key = "<your-deployment-name>/infra-phase-2.tfstate"
    
  3. Run the 2_infra_phase_2 Terraform container: Using the wrapper script:
    cd installation_steps/aws/2_infra_phase_2/
    ./run_terraform.sh
    
    The wrapper script uses podman by default. Change it to docker if needed. Replace <container_registry>/poolside-self-managed-infra-phase-2:<container_image_tag> with your full container image URL.
    Or run manually:
    podman run -it \
      -v $(pwd)/remote.tf:/poolside/infra-phase-2/remote.tf \
      -v $(pwd)/terraform.tfvars:/poolside/infra-phase-2/terraform.tfvars \
      -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
      -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
      -e AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN} \
      $CONTAINER_REGISTRY_URI/poolsideai/poolside-self-managed-infra-phase-2:$POOLSIDE_VERSION_TAG
    
  4. Inside the container, run Terraform commands:
    terraform init
    terraform plan
    terraform apply
    
  5. Exit the container.

Step 2.5: Upload containers and model checkpoints

Upload Poolside and third-party images to your registries and place model checkpoints in S3. You can run this in parallel with Step 2: Provision node groups and add-ons.
If you enabled ecr_create_repositories = true in Step 1: Create the base infrastructure, the required ECR repositories for application images were created automatically and will be used as targets in this step.
Poolside provides a container image in the bundle at containers/poolside-container-uploader-full---<tag>. It includes the scripts and all container images required for fully air-gapped deployments.
  1. Run the container uploader: Using the wrapper script:
    cd installation_steps/aws/2.5_container_upload/
    ./run.sh
    
    The wrapper script is podman as the default option. Update it to docker if needed. Update the container image path value <container_repository>/poolside-self-managed-infra-phase-1:<container_version_tag> with full image url. Edit the run.sh file to pass the AWS_REGION environment varaible using the -e flag -e AWS_REGION={AWS_REGION} \
    Or run manually:
    podman run -it \
    -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
    -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
    -e AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN} \
    -e AWS_REGION={AWS_REGION} \
      $CONTAINER_REGISTRY_URI/poolsideai/poolside-container-uploader-full:$POOLSIDE_VERSION_TAG
    
  2. Inside the container, run upload-containers.sh to populate ECR. The script reads registry targets from AWS Systems Manager Parameter Store (SSM) (populated by Step 1: Create-the-base-infrastructure if you enabled ECR creation) and uploads the required Poolside and third-party images.
    ./upload-containers.sh --deployment-name $DEPLOYMENT_NAME
    
  3. Copy model checkpoints to the S3 bucket created in Step 1: Create-the-base-infrastructure. The bucket is typically named poolside-<your-deployment-name>:
    aws s3 cp ./checkpoints s3://poolside-$DEPLOYMENT_NAME/checkpoints --recursive --region $AWS_REGION
    
    Poolside will provide the checkpoint details before you start this process. This can run in the background and does not block system availability.

Step 3: Deploy the application

Deploy the Poolside workloads, services, and ingress to the EKS cluster.
If you are using a bastion host, connect to it before running this step:
ssh -i $KEY_FILE ubuntu@<bastion-host-ip>
  1. Update installation_steps/aws/3_deployment/terraform.tfvars:
    # Must match the deployment name from previous steps
    deployment_name = "<your-deployment-name>"
    region          = "<your-region>"
    
    ingress = {
      # The DNS hostname that will be applied to the NLB via a CNAME record
      hostname    = "<your-domain-name>"
      # "internal" or "external" depending upon needs
      access_type = "internal"
    }
    
    # Optional: Enable AWS Cognito for SSO (default = false)
    # use_aws_cognito = true
    
  2. If using remote state, copy remote.tf and adjust the key field:
    key = "<your-deployment-name>/deploy.tfstate"
    
  3. Run the deployment container: Using the wrapper script:
    cd installation_steps/aws/3_deployment/
    ./run_terraform.sh
    
    Or run manually:
    podman run -it \
      -v $(pwd)/remote.tf:/poolside/deployment/remote.tf \
      -v $(pwd)/terraform.tfvars:/poolside/deployment/terraform.tfvars \
      -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
      -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
      -e AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN} \
      $CONTAINER_REGISTRY_URI/poolsideai/poolside-self-managed-deployment:$POOLSIDE_VERSION_TAG
    
  4. Inside the container, run Terraform commands:
    terraform init
    terraform plan
    terraform apply
    
    A successful apply produces output similar to:
    Apply complete! Resources: 25 added, 0 changed, 0 destroyed.
    
    Outputs:
    
    inference_config = <sensitive>
    ingress = {
      "hostname" = "<hostname>.us-east-2.elb.amazonaws.com"
      "ip"       = ""
    }
    poolside_deployment_config = <sensitive>
    
  5. Record the value of ingress.hostname. You will use it to create your DNS record in Step 4: Complete post deployment setup.

Step 4: Complete post-deployment setup

  1. Configure DNS. Create a DNS CNAME (or an ALIAS record in Route 53) that points your domain name to the ingress.hostname output from Step 3: Deploy the application. For Route 53 instructions, see the Route 53 resource record set creation guide.
    The load balancer validates the request hostname. If the DNS CNAME does not match the ingress.hostname, requests are rejected.
    If you use Cloudflare for DNS, disable proxying for the deployment hostname. Cloudflare proxying overrides the provided certificate and breaks TLS.
  2. Complete the initial Poolside setup. After DNS propagation completes, open a browser and navigate to:
    https://<your-domain-name>
    
    Follow the on-screen setup to bind your OIDC provider:
    • If you use AWS Cognito, use the values outputted from Terraform.
    • If you use an enterprise IdP, follow your organization’s OIDC registration flow.
    For more information, see OIDC authentication. To retrieve configuration values from Terraform (from within the deployment container), run:
    terraform output poolside_deployment_config
    
  3. Load Poolside models. Load models through the Poolside Console after deployment and after you upload the checkpoints to S3. For more information, see Models.

Verification

Verify that your Poolside deployment is running successfully.
You can also view this information in the AWS Management Console using the EKS Kubernetes resources viewing guide.
Update the context to your EKS cluster:
aws eks update-kubeconfig --region $AWS_REGION --name poolside-$DEPLOYMENT_NAME
Verify all Kubernetes resources are running:
kubectl get pods -n poolside
Example output (pod suffixes will vary):
NAMESPACE   NAME                                           READY   STATUS    RESTARTS   AGE
poolside    core-api-58fb46cb65-h7zrk                      1/1     Running   0          6m6s
poolside    core-api-58fb46cb65-x8t58                      1/1     Running   0          6m6s
poolside    core-api-58fb46cb65-z2t6b                      1/1     Running   0          6m6s
poolside    core-api-models-reconciliation-loop-0          1/1     Running   0          6m6s
poolside    web-assistant-6f5977c886-hcttp                 1/1     Running   0          6m6s
poolside    web-assistant-6f5977c886-jnrjt                 1/1     Running   0          6m6s
poolside    web-assistant-6f5977c886-tb6rm                 1/1     Running   0          6m6s

Bedrock model detection

If you are using Bedrock for inference, verify that the Poolside models are being detected:
kubectl logs -n poolside -l=app.kubernetes.io/name=core-api-models-reconciliation-loop
Example output:
{"time":"...","level":"INFO","msg":"Added model for Bedrock model arn:aws:bedrock:<region>::foundation-model/poolside.malibu-v1:0","model":{"ModelID":"poolside.malibu-v1:0","ModelName":"poolside malibu",...}}
You can also verify Bedrock models are available in your region:
aws --region $AWS_REGION \
    bedrock list-foundation-models \
    --by-provider poolside \
  | jq -r '.modelSummaries[] | "\(.modelArn) : \(.modelName)"'

Local GPU model verification

If you are using local GPU inference, verify that the model pods are running:
kubectl get pods -n poolside-models
kubectl logs <inference-pod-name> -n poolside-models | tail -n 20
Look for logs indicating successful model load and readiness.

Troubleshooting

DNS verification

Ensure your DNS record resolves to the correct AWS load balancer:
dig $DOMAIN_NAME
Expected: a CNAME to the load balancer hostname or the corresponding IP.

Registry authentication errors

If an image push or pull fails, re-authenticate:
aws ecr get-login-password --region $AWS_REGION | podman login --username AWS --password-stdin $CONTAINER_REGISTRY_URI

EKS access issues

If kubectl cannot connect, refresh your credentials:
aws eks update-kubeconfig --region $AWS_REGION --name poolside-$DEPLOYMENT_NAME

Bedrock region

If Bedrock models are not detected, verify they are available in your selected region:
aws --region $AWS_REGION \
    bedrock list-foundation-models \
    --by-provider poolside \
  | jq -r '.modelSummaries[] | "\(.modelArn) : \(.modelName)"'
If no models are returned, try us-east-1 or us-west-2, or set the bedrock_region variable in your terraform.tfvars.