Skip to main content

Introduction

This section is for Poolside administrators operating in on-premises environments where access and support may be limited.

Helpful aliases

Either set these per bash session, or add them to your ~/.bashrc file to help assist with repetitive keystrokes and commands.
# Shorthand kubectl, for example, k get pods, k logs
alias k=kubectl

# Namespace context switching, for example kcs poolside, or kcs poolside-services
alias kcs='kubectl config --set-context --current --namespace '

# Alias repetives 'get' commands
alias kgp='kubectl get pods'
alias kgd='kubectl get deployments'
alias kl='kubectl logs'

# Alias describe
alias kd='kubectl describe'

Check namespaces

The Poolside application uses several namespaces:
  • poolside
  • poolside-models
  • poolside-services
  • kube-system
kubectl get namespaces 

Confirm deployments for each namespace

All deployments / pods should exist in either a 1/1 Ready state or 0/1 Completed state for jobs that have not yet been removed. poolside Hosts the core application services - the front-end UI and the API.
  • core-api
  • web-assistant
kcs poolside
kubectl get deployments,ingress,svc,pods
poolside-models poolside-models should have one deployment per model that has been deployed.
Deployment identifiers use a randomly generated UUID and appear in the format inference-<uuid>.
If you do not create any models using Splash or the UI, this namespace remains empty.
kcs poolside-models
k get deployments
poolside-services Hosts the backend, secondary services necessary for an on-prem Poolside deployment. Postgres is the database for Poolside. Keycloak is used for the OIDC client to configure authentication and authorization to Poolside during setup. SeaweedFS is used for object storage to emulate the S3 API for model checkpoints.
  • postgres
  • keycloak
  • seaweedfs-admin
  • seaweedfs-filer
  • seaweedfs-master
  • seaweedfs-volume

Shutdown

Shutting down Poolside in on-prem environments involves scaling down all resources and deployed objects in the correct order. If you are shutting down to facilitate moving to a new network, review the steps in the Network relocation guide instead.

Scale down Poolside deployments

# Scale all poolside-models deployments to 0. This will take a minute to complete. 
kubectl scale deployment inference-<MALIBU> -n poolside-models --replicas=0
kubectl scale deployment inference-<POINT> -n poolside-models --replicas=0

# Scale all poolside deployments to 0
kubectl scale deployment core-api  -n poolside --replicas=0
kubectl scale deployment web-assistant  -n poolside --replicas=0

# Scale poolside-services to 0 
kubectl scale deployment keycloak -n poolside-services --replicas=0
kubectl scale statefulset seaweedfs-admin -n poolside-services --replicas=0
kubectl scale statefulset seaweedfs-filer -n poolside-services --replicas=0
kubectl scale statefulset seaweedfs-master -n poolside-services --replicas=0
kubectl scale statefulset seaweedfs-volume -n poolside-services --replicas=0
kubectl scale statefulset postgres -n poolside-services --replicas=0

Shut down RKE2

You can shut down the system after this step.
sudo systemctl stop rke2-server

Startup

Starting up Poolside after a shutdown on the same network involves bringing services back up in the correct order. RKE2 is configured to start automatically on boot. Poolside recommends waiting ~5 minutes for RKE2 to initialize and become healthy after a shutdown or reboot before scaling Poolside services back up.
# Scale poolside-services to 1
kubectl scale deployment keycloak -n poolside-services --replicas=1
kubectl scale statefulset seaweedfs-admin -n poolside-services --replicas=1
kubectl scale statefulset seaweedfs-filer -n poolside-services --replicas=1
kubectl scale statefulset seaweedfs-master -n poolside-services --replicas=1
kubectl scale statefulset seaweedfs-volume -n poolside-services --replicas=1
kubectl scale statefulset postgres -n poolside-services --replicas=1

# Scale poolside deployments to 3
kubectl scale deployment core-api  -n poolside --replicas=3
kubectl scale deployment web-assistant  -n poolside --replicas=3

# Scale all poolside-models deployments to 1.
# Scale each deployment separately, and wait for it to become healthy, before scaling in the next.
kubectl scale deployment inference-<MALIBU> -n poolside-models --replicas=1
kubectl scale deployment inference-<POINT> -n poolside-models --replicas=1

Troubleshooting steps

Almost all troubleshooting for an on-prem deployment focuses on the Network tab in your browser’s developer tools and kubectl logs command.
The following steps are generalized and error dependent.
For any browser-based errors at any point during the usage or function of Poolside, open the Developer tools, select the Network tab, and identify the failing request + response. All requests pass through core-api, so it’s the best place to review the logs for initial indicators.
# Use a label selector to retrieve logs for all core-api pods, filtering out good (200) responses.
kcs poolside
kubectl logs -l 'app.kubernetes.io/name=core-api' | grep -v 'msg="200'
For login or authentication related issues, review both core-api and keycloak.
kcs poolside
kubectl logs -l 'app.kubernetes.io/name=core-api' | grep -v 'msg="200'

kcs poolside-services
kgp | grep keycloak
kubectl logs keycloak-<pod hash from above command>
For model issues, review the logs in the inference pods.
kcs poolside-models
k get pods
k logs inference-<hash from pod>
To check general environment details, navigate to https://poolside.poolside.local/v0/environment.

Checking for TLS certificates

Poolside uses cert-manager to issue and renew certificates. The self-signed CA certificate is typically named poolside-self-signed-ca. If certificates are missing, expired, or stuck in a non-ready state, verify cert-manager resources and events before troubleshooting application pods.
kcs poolside
k get certificates
k describe <output from previous command>

kcs poolside-services 
k get certificates
k describe <output from previous command>

# Confirm the self-signed CA certificate exists
kubectl get certificates -A | grep poolside-self-signed-ca

# Debug cert-manager if certificates are not generating
kubectl get certificates,issuers,clusterissuers -A
kubectl -n cert-manager get pods
kubectl -n cert-manager get events --sort-by=.lastTimestamp

SSL / x509 errors

The cluster CA certificate signs all self-signed certificates used by SeaweedFS, Poolside, Keycloak, and PostgreSQL. TLS is enforced for database connections. Other services terminate TLS at the ingress, and in-cluster traffic is otherwise plaintext. This is an area planned for improvement. Terraform installs the Poolside CA certificate into the host OS trust store using /usr/local/... paths and runs the OS update command. Avoid placing Poolside CA certificates under /etc paths. You must import the CA certificate to each host that interacts with the platform, either by CLI, browser, or IDE. After you import the certificate, restart the application or browser, or start a new private session so it takes effect.

Importing on Windows

Double-click the certificate file (.crt or .pem)
Click "Install Certificate..."
Choose "Local Machine" and click "Next"
Select "Place all certificates in the following store"
Click "Browse" → Select "Trusted Root Certification Authorities" → "OK"
Click "Next" → "Finish"

Importing on macOS

Double-click the certificate file
This opens Keychain Access and the certificate appears in the list
Double-click the imported certificate
Expand "Trust" section
Set "When using this certificate" to "Always Trust"
Close the window and enter your password when prompted

# Alternatively, import to system keychain
sudo security add-trusted-cert -d -r trustRoot -k /System/Library/Keychains/SystemRootCertificates.keychain ca-poolside.crt

Importing on Ubuntu and Debian

sudo cp ca-poolside.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates

Importing on Red Hat Enterprise Linux and Fedora

sudo mkdir -p /usr/local/share/ca-trust-source/anchors
sudo cp ca-poolside.crt /usr/local/share/ca-trust-source/anchors/
sudo update-ca-trust

Certificate distribution

Some common mechanisms for distributing, and disseminating the Poolside CA TLS certificate include:
  • Group Policy for all domain-joined machines to a specific Certificate Store
  • Configuration Management Tools such as Ansible or Puppet can be used for mixed environments
  • MDM such as Intune to push certificates via Configuration profiles
  • Browser Enterprise policies can push certificates
  • Direct sharing, email, fileshare, browser hosting