Documentation Index
Fetch the complete documentation index at: https://docs.poolside.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Use this page to size a Poolside deployment for on-premises and cloud environments. It explains what affects capacity, lists Poolside’s measured concurrent-agent capacity for each supported hardware tier, and shows how to translate those numbers into a developer seat count. For hardware specifications and supported configurations, see Supported configurations.What affects capacity
Real-world capacity depends on factors that vary by team and workload:- Average step latency for the model and hardware combination
- Number of steps each agent task takes to complete
- Average task complexity
- Average context window utilization per request
- Mix of agent, chat, and completion workloads
- Time-of-day concurrency patterns and burst behavior
How Poolside measures capacity
Poolside publishes capacity numbers measured under deliberately conservative conditions:- Step time threshold: Average step time stays under five seconds across the measured agent population.
- Quantization: Laguna numbers reflect FP8 model weights with an FP8 KV cache. Malibu 2.2 INT4 numbers reflect INT4 weights.
- Concurrency unit: Each concurrent agent is an active agent task occupying a model-serving slot, not a logged-in developer.
Concurrent-agent capacity by hardware
The following table lists the maximum number of concurrent agents each hardware tier supports while staying under the step time threshold. Bold values are measured. Italic values are extrapolated from measured numbers on similar configurations.| Hardware tier | Total GPU memory | Laguna M.1 | Laguna XS.2 | Malibu 2.2 INT4 |
|---|---|---|---|---|
| 8× H200 (HGX rack or BYO) | 1128 GB | 41 | 80 | 38 |
| 4× H200 (BYO minimum) | 564 GB | ~20 | ~40 | Untested |
| 8× RTX 6000 Blackwell (rack) | 768 GB | ~32 | ~112 | ~12 |
| 4× RTX 6000 Blackwell (tower) | 384 GB | 16 | 56 | 6 |
Tail latency varies across configurations. On 4× RTX 6000 Blackwell, Laguna
M.1 has a p99 step time of around 21 seconds, higher than other supported
combinations. For latency-sensitive interactive workloads, prefer Laguna XS.2
on this tier or move to an H200 configuration.
Translate concurrent agents into developer seats
Concurrent-agent capacity is not the same as the number of developers a deployment supports. A developer running an agent task occupies a slot for the duration of that task. Outside of an active task, the developer does not consume capacity. To estimate supported seats, divide concurrent-agent capacity by the fraction of seats actively running an agent at peak:Recommended planning ratio
Use a planning range of 25 to 40 percent, with 40 percent as the conservative default for initial sizing. Laguna agent tasks typically take two to three minutes to complete. Each active agent occupies a slot for that full duration, so the instantaneous concurrency ratio for agent workloads runs higher than for chat-style models. Without real-world telemetry from your deployment, plan against the higher end of the range. Pick the lower end (25 percent) when:- Your deployment has telemetry showing light concurrency
- The workload mixes agents with chat and completion in roughly equal share
- You are sizing a pilot or limited rollout
- This is a first-time sizing without observed concurrency
- The team works in an agent-first culture
- Latency degradation is particularly disruptive in your environment
Worked examples
The following table shows supported seat counts at both the 40 percent conservative default and the 25 percent lower bound. Actual capacity depends on your observed concurrency.| Hardware tier | Model | Concurrent agents | Seats at 40% | Seats at 25% |
|---|---|---|---|---|
| 8× H200 | Laguna M.1 | 41 | ~100 | ~165 |
| 8× H200 | Laguna XS.2 | 80 | ~200 | ~320 |
| 4× H200 | Laguna XS.2 | ~40 | ~100 | ~160 |
| 8× RTX 6000 Blackwell | Laguna XS.2 | ~112 | ~280 | ~450 |
| 4× RTX 6000 Blackwell | Laguna M.1 | 16 | ~40 | ~65 |
| 4× RTX 6000 Blackwell | Laguna XS.2 | 56 | ~140 | ~225 |
Choose a model for your deployment
For full model details, see Supported models.| Model | When to choose it |
|---|---|
| Laguna XS.2 | Moderate complexity use cases where concurrent-agent throughput is the priority, and/or high performance GPU availability is constrained. Strong fit for agent workloads with acceptable tail latency. |
| Laguna M.1 | Choose when agent quality matters more than raw throughput. Best fit on 8× H200 hardware. Viable on 4× RTX 6000 Blackwell for small teams, with reduced concurrency and higher tail latency. |
| Malibu 2.2 | Choose for existing deployments, dense-model preferences, or when an INT4 quantization path is required. New deployments on RTX 6000 Blackwell hardware should consider Laguna XS.2 first. |
| Point | Choose for editor completion alongside an agent model. |