Overview
Use this page to estimate how many concurrent agents a Poolside deployment can support and how that capacity translates into developer seats. The planner models Laguna deployments across hardware, model, context, and latency assumptions. For supported deployment paths and minimum hardware requirements, see Supported configurations. The planner can model configurations that are useful for comparison, but it does not make an unsupported configuration supported.Capacity estimates are planning inputs, not guarantees. Validate final sizing with your Poolside account team before you commit to production hardware or a large rollout.
Estimate capacity
Use the planner to estimate the maximum number of active agent tasks your deployment can sustain under the selected assumptions. Set the inputs to match the deployment you are planning:- Hardware: Select the GPU type you want to model.
- Model: Select the Laguna model you plan to deploy.
- Number of GPUs: Select the number of GPUs assigned to the model-serving node.
- Average context per task: Select how large the agent’s context window grows by the end of a typical task. Use a higher value for longer tasks, larger codebases, or workflows that read many files. Use a lower value for short, focused tasks.
- Step-latency SLO: Select the p50 latency target per agent turn. A stricter SLO lowers the number of concurrent agents the deployment can serve.
- Concurrent agents: The estimated number of active agent tasks that can occupy model-serving slots at the same time.
- Seats at 40%: A conservative developer-seat estimate for first-time sizing or agent-heavy usage.
- Seats at 25%: A lighter-concurrency estimate for pilots, mixed workloads, or deployments with telemetry that shows lower peak activity.
Interpret the estimate
Concurrent-agent capacity is not the same as the number of developers a deployment supports. A developer consumes a model-serving slot only while an agent task is actively running. Outside of an active task, the developer does not consume agent capacity. To estimate supported seats, divide concurrent-agent capacity by the fraction of seats actively running an agent at peak:Seat estimate
- Your deployment has telemetry showing light concurrency
- Your workload mixes agent, chat, and completion usage
- You are sizing a pilot or limited rollout
- This is a first-time sizing without observed concurrency
- The team works in an agent-first culture
- Latency degradation is particularly disruptive in your environment
Understand calibration confidence
The planner uses an analytical inference simulator calibrated against measured Poolside benchmarks. The confidence badge in the planner indicates how closely the selected configuration matches measured data:- Calibrated: Direct measurement exists for the selected model, GPU, GPU count, and precision.
- Same arch: Measurement exists for the same model on the same GPU architecture.
- Factorized, partial signal, or arch median: The estimate depends more heavily on extrapolation.
What affects capacity
Real-world capacity depends on your deployment shape and workload:- Model choice
- GPU type and GPU count
- Weight and key-value cache precision
- Average context size per trajectory
- Step-latency target
- Number of steps each agent task takes
- Mix of agent, chat, and completion workloads
- Peak-time concurrency and burst behavior
Choose a model for your deployment
For full model details, see Supported models.| Model | When to choose it |
|---|---|
| Laguna XS.2 | Use when concurrent-agent throughput is the priority, you have limited GPU availability, or you need a strong default for most agent workloads. |
| Laguna M.1 | Use when agent quality matters more than raw throughput. It is the best fit on 8× H200 hardware and can serve smaller teams on RTX 6000 Blackwell when you need lower concurrency. |
| Point | Use for editor completion alongside an agent model. |