Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.poolside.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Use this guide to relocate an on-premises Poolside server to a different physical network. This guide covers server relocation only. It does not cover version upgrades or migrations between releases, such as V1 to V2 or r20250403 to r20250527. Because RKE2 nodes use static IP addresses, relocating the server requires you to stop Poolside services cleanly, update network-dependent configuration, reset the cluster node IP, and bring services back online. For additional background, see the RKE2 discussion on changing the primary IP address of a node.

Shut down the server

Shut the server down cleanly before you move it. During system shutdown, the poolside-services unit drains workloads, scales each deployment and StatefulSet to 0, and saves the original replica count as an annotation on the resource. rke2-server then stops through the shutdown.target ordering.
sudo shutdown -h now
For details on stop methods, expected timing, and verification, see Server and service maintenance.

Start the server

Update /etc/hosts

This guide assumes your primary IP address has changed and you need to update the node IP.
Update /etc/hosts, replacing the <OLD IP> <node-hostname> with the new IP address.
This ensures the RKE2 master node can appropriately resolve DNS for the hostname to the new IP.
The default hostname in current on-premises installations is poolside-server. The hostname is also used as the Kubernetes node name, and local PVs are tied to that node name. Avoid changing the hostname during relocation unless you also update local PV node affinity.
# Example of old entry - ${IP} ${node-hostname} ${ingress}
# 192.168.1.30 poolside-server poolside.poolside.local

# After updating with new IP
192.168.1.40 poolside-server poolside.poolside.local

Update the system DNS resolver

If the DNS servers changed during the move, update /etc/systemd/resolved.conf to point at the new servers.
sudo vim /etc/systemd/resolved.conf
Set the DNS= line to the new DNS servers, separated by spaces:
DNS=<new-dns-server-1> <new-dns-server-2>
Restart systemd-resolved and confirm the new servers are active:
sudo systemctl restart systemd-resolved
resolvectl status

Reset master node IP

For additional details, see the RKE2 discussion on changing the primary IP address of a node.
# The installer disables UFW for RKE2, but if it is still enabled, disable it (Ubuntu only)
sudo systemctl disable --now ufw

# Disable AppArmor (Ubuntu only)
sudo systemctl disable --now apparmor.service

# On RHEL, check firewalld instead
sudo systemctl disable --now firewalld

# Stop all RKE2 services - exists in path at /usr/local/bin/rke2-killall.sh
sudo rke2-killall.sh

# Reset the cluster configuration to update Master IP
sudo rke2 server --cluster-reset

# Start the service
sudo systemctl start rke2-server 

# Confirm the service is in a running state
sudo systemctl status rke2-server

# Confirm all Nodes Ready
kubectl get nodes

Update CoreDNS for Keycloak

After the external interface or ingress IP has changed, update the Keycloak entry in the CoreDNS ConfigMap. Use the Terraform path when you have access to the install bundle. Use the manual path as a fallback. Rerun Terraform (preferred) Rerun Step 3: Install supporting infrastructure services. Terraform updates the CoreDNS entries for Keycloak as part of that phase. Terraform applies the ConfigMap change automatically and does not require a manual restart. Edit the ConfigMap manually If you cannot rerun Terraform, edit the ConfigMap directly.
# Find the CoreDNS ConfigMap
kubectl get configmap -n kube-system | grep dns

# Edit the ConfigMap
kubectl edit configmap <coredns-configmap> -n kube-system
Find the Keycloak block and update the IP:
keycloak.poolside.local:53 {
  hosts {
    <new-ingress-ip> keycloak.poolside.local
    fallthrough
  }
}
Restart CoreDNS so it picks up the change:
# Find the CoreDNS deployment
kubectl get deployment -n kube-system | grep dns

# Restart CoreDNS
kubectl -n kube-system rollout restart deployment <coredns-deployment>
Confirm CoreDNS pods come up without errors:
kubectl logs -f -l app.kubernetes.io/name=rke2-coredns -n kube-system

Start Poolside workloads

After you update CoreDNS, start the poolside-services unit. The unit removes the node cordon and restores deployments and StatefulSets to their pre-shutdown replica counts from the annotations saved during shutdown.
sudo systemctl start poolside-services
For details on start methods, expected timing, and verification, see Server and service maintenance.

Validate the relocation

Validate models are in a running state (1/1 Ready). After models are running, log in to https://poolside.poolside.local and test Chat completion, and update your Poolside extension API to https://poolside.poolside.local to confirm code completion.
kubectl get deployments -n poolside-models
kubectl get pods -n poolside-models

Switch the active interface without a reboot

If both the old and new network interfaces are connected to the server and you want to switch IP addresses without shutting down, replace the network configuration instead of following the Shut down the server flow earlier in this page. After the IP change, you still need to update /etc/hosts, run Reset master node IP, and rerun Step 3: Install supporting infrastructure services to refresh CoreDNS for Keycloak.

Replace the network configuration

Copy the existing /etc/netplan/90-NM-<id>.yaml file for the active interface and edit the copy to target the new interface and address. Move the original out of /etc/netplan/ so only the new file is loaded.
network:
  version: 2
  ethernets:
    NM-<id>:
      renderer: NetworkManager
      match:
        name: "<new-interface>"
      addresses:
        - "<new-ip>/<prefix>"
      nameservers:
        addresses:
          - <new-dns-server-1>
          - <new-dns-server-2>
      dhcp6: true
      wakeonlan: true
      networkmanager:
        name: "Wired connection 4"
        passthrough:
          connection.autoconnect-priority: "-999"
          ethernet._: ""
          ipv4.address1: "<new-ip>/<prefix>,<new-gateway>"
          ipv4.method: "manual"
          ipv6.addr-gen-mode: "default"
          ipv6.ip6-privacy: "-1"
          proxy._: ""
Apply the configuration:
sudo netplan try
sudo netplan apply
If the server has multiple NICs available after the switch, edit /etc/rancher/rke2/config.yaml to force RKE2 to advertise on the specific IP you want.

Verify the new interface

sudo ip a
hostname -i
ssh <new-ip>
ss -tulpn | grep 22
sudo journalctl -u ssh -f
In some cases, you must unplug the cable from the old NIC before the server accepts connections on the new IP.

Verify DNS

Confirm UDP 53 is allowed from the new server location to the new DNS servers.
# If dig times out, UDP 53 is blocked
dig @<new-dns-server> <test-hostname>

# Test name resolution
resolvectl query <test-hostname>

# Test UDP 53 connectivity to the new DNS server
nc -zvu <new-dns-server> 53
Confirm CoreDNS is using the expected upstream DNS servers:
COREDNS_POD=$(kubectl get pods -l app.kubernetes.io/name=rke2-coredns -n kube-system -o name)
kubectl exec -it $COREDNS_POD -n kube-system -- cat /etc/resolv.conf

Common errors

You’ve created a model, but it is stuck in an erroring, or CrashLoopBackOff state. Logs show:
    raise RuntimeError("Failed to infer device type")
RuntimeError: Failed to infer device type
This error is common when the inference pod cannot detect GPUs. The error typically presents when the model is created on custom hardware without a preset, and is not ascribed the “none” preset. Additionally, it can manifest when the underlying hardware has GPU issues. Confirm the model was created with preset: none.
This can be checked by navigating to https://<web-hostname>/console/modelsSelect the ModelEdit Model.
Additionally, check the status of the gpu-operator namespace, and confirm all pods are healthy, running, or completed.
# A valid, running configuration
$ kubectl get pods -n gpu-operator
NAME                                                          READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-mkhbf                                   1/1     Running     0          2d4h
gpu-operator-666bbffcd-hhqfc                                  1/1     Running     0          2d4h
gpu-operator-node-feature-discovery-gc-7c7f68d5f4-sz64r       1/1     Running     0          2d4h
gpu-operator-node-feature-discovery-master-58588c6967-gm9gt   1/1     Running     0          2d4h
gpu-operator-node-feature-discovery-worker-5hqq6              1/1     Running     0          2d4h
nvidia-container-toolkit-daemonset-44rsg                      1/1     Running     0          2d4h
nvidia-cuda-validator-pvv2n                                   0/1     Completed   0          2d4h
nvidia-dcgm-exporter-mm2vc                                    1/1     Running     0          2d4h
nvidia-device-plugin-daemonset-tj4x2                          1/1     Running     0          2d4h
nvidia-mig-manager-cbsv2                                      1/1     Running     0          2d4h
nvidia-operator-validator-qg27q                               1/1     Running     0          2d4h


# An invalid, problematic deployment
$ kubectl get pods -n gpu-operator
NAME                                               READY   STATUS      RESTARTS      AGE
gpu-feature-discovery-7frhq                        0/1     Init:0/1    5             16h
gpu-operator-node-feature-discovery-worker-rdvc5   1/1     Running     6 (15h ago)   16h
nvidia-container-toolkit-daemonset-zsdt6           0/1     Init:0/1    5             16h
nvidia-cuda-validator-s2k95                        0/1     Completed   0             4d3h
nvidia-dcgm-exporter-bhmrn                         0/1     Init:0/1    5             16h
nvidia-device-plugin-daemonset-mzsvb               0/1     Init:0/1    5             16h
nvidia-operator-validator-5shw6                    0/1     Init:0/4    5             16h
Poolside logins are returning a 500 error. Ensure the CoreDNS ConfigMap has the new IP address, and restart the CoreDNS deployment.