Skip to main content
Chamber’s scheduling engine decides where and when your GPU workloads run — enforcing capacity guarantees, hierarchical bursting, and preemption automatically.

How a Workload Gets Scheduled

1

Submit

You submit a workload via the API, CLI, or dashboard. Chamber records it and marks it Pending.
2

Allocate

The scheduler assigns an allocation — picking the best fit based on your team’s reservations and available capacity. The workload moves to Queued.
3

Dispatch

The workload is dispatched to the Chamber Agent running in your Kubernetes cluster.
4

Admit

The in-cluster scheduler admits the workload based on your team’s quota and current demand. When GPUs are available, it places the pods. The workload moves to Starting, then Running.
5

Report

The agent reports status back — you see real-time updates as your workload progresses through to Completed, or is Preempted / Failed if something changes.

Reserved vs Elastic Workloads

Every workload you submit is either Reserved or Elastic. This determines how it accesses GPU capacity and whether it can be preempted.
ReservedElastic
What it meansGuaranteed capacity backed by your team’s reservationUses spare GPUs when available
Can it be preempted?NeverYes — when reserved work needs the GPUs
Priority levelNon-preemptible (priority ≥ 100)Preemptible (priority < 100)
Best forProduction inference, critical training runsExperiments, hyperparameter sweeps, batch jobs
Use Reserved for work that can’t be interrupted. Use Elastic to take advantage of idle GPUs across the cluster — you get free capacity, but accept the trade-off that it can be reclaimed.

How Scheduling Decisions Are Made

When you submit a workload:
  1. Chamber picks the best allocation for your workload based on available capacity across your team’s reservations.
  2. Reserved workloads are admitted first. If your team’s allocation has room, the workload starts. Reserved capacity is guaranteed and protected.
  3. Elastic workloads use surplus capacity. They run on idle GPUs — either from your team’s unused reservation or from spare capacity elsewhere in the pool. Elastic workloads can be reclaimed at any time when reserved work needs the GPUs.
  4. The scheduler evaluates continuously. As workloads complete and GPUs free up, waiting workloads are admitted automatically.

Key Terms

TermWhat It Means
BurstingWhen a team’s elastic workloads use GPUs beyond their reserved allocation. Your team “bursts” into idle capacity from the pool.
Burst limitThe maximum amount a team can burst beyond its reservation. For example, a 50% burst limit on a 32-GPU reservation means the team can use up to 48 GPUs total (32 reserved + 16 burst). Set to 0 to disable bursting entirely.
Bursting priorityA weight (1–10) that controls how surplus GPUs are divided when multiple teams are bursting at the same time. Higher priority means a larger share of the available surplus.
PreemptionWhen the scheduler stops an elastic workload to free GPUs for higher-priority work. Only elastic workloads can be preempted — reserved workloads are always protected.

Hierarchical Bursting

Your organization’s teams are arranged in a tree. Chamber respects this hierarchy when distributing surplus capacity:
Acme Corp
├── Research (bursting priority: 3)
│   ├── NLP Team (bursting priority: 2)
│   └── Vision Team (bursting priority: 1)
└── Production (bursting priority: 2)
    ├── Inference (bursting priority: 1)
    └── Training (bursting priority: 1)
  • Each team has a bursting priority that controls its share of surplus GPUs when multiple teams are bursting.
  • Surplus capacity flows down the tree proportionally — Research gets 60% of excess (priority 3 out of 5), Production gets 40% (priority 2 out of 5).
  • Within Research, NLP gets twice the surplus share of Vision (priority 2 vs 1).
  • Teams don’t compete in a flat pool. The hierarchy ensures organizational priorities are respected even when capacity is tight.
  • Each team can also have a burst limit that caps how far beyond its reservation it can go — preventing any single team from consuming the entire pool.

Example: Hierarchical Bursting in Action

A pool has 100 GPUs. Research has 48 reserved, Production has 32 reserved. Research is only using 20.
TeamReservationIn UseIdleWhat Happens
Research48202828 idle GPUs available for bursting
Production32320Elastic workloads can burst into Research’s idle capacity
Pool total1005248 surplus GPUs distributed by bursting priority
Production’s elastic workloads can temporarily burst into Research’s idle GPUs. But the moment Research submits reserved work that needs those GPUs back, Production’s elastic workloads are preempted.

Preemption

Preemption is how Chamber reclaims GPUs for higher-priority work. It’s designed to be predictable and minimize disruption.

Rules

  1. Reserved workloads are never preempted. The scheduler will not reclaim them under any circumstances.
  2. Elastic workloads are always reclaimable. The scheduler can reclaim their resources when reserved workloads need capacity.
  3. Higher-priority elastic workloads displace lower-priority ones. Among elastic workloads, priority determines who stays when the pool is full.
  4. Cheapest workloads are reclaimed first. The scheduler picks victims in this order:
    • Queued workloads (haven’t started — zero cost)
    • Starting workloads (pods being created — minimal cost)
    • Running workloads, lowest priority first (most expensive, last resort)
  5. Only enough workloads are reclaimed to free the capacity needed — no more.

Preemption Examples

NLP Team has a 32-GPU reservation. They’re running 20 GPUs of reserved inference and 12 GPUs of elastic experiments — fully utilizing their allocation. A researcher submits a 12-GPU reserved training workload.The 12 elastic experiments are preempted to make room. Reserved always wins over elastic within the same allocation.
The pool is fully utilized. Team A is running an 8-GPU elastic sweep at priority 50. Team B submits an 8-GPU elastic workload at priority 75.Both are elastic, but priority determines who stays. The lower-priority sweep is reclaimed.
Vision Team has a 16-GPU reservation but is only using 4 GPUs. A researcher submits an 8-GPU elastic workload.Plenty of idle capacity — the elastic workload starts immediately with no preemption. If reserved work later needs those GPUs, this elastic workload would be the first to go.
Research has 48 GPUs reserved but is only using 20. Production’s elastic workloads have expanded into 28 of Research’s idle GPUs. A Research team member submits a 24-GPU reserved workload.Research’s reservation can support the new workload (48 − 20 = 28 free slots ≥ 24 needed). The scheduler preempts 24 of Production’s 28 elastic workloads — just enough to free the physical GPUs. Production’s remaining 4 elastic workloads and all reserved workloads are untouched.

Fractional GPUs

Not every workload needs a full GPU. You can request fractions in 0.25 increments:
RequestUse Case
0.25 GPULightweight inference, notebooks, debugging
0.5 GPUModerate inference, small model fine-tuning
0.75 GPUHeavier single-GPU tasks
1.0+ GPUStandard training and inference
Multiple fractional workloads share a single physical GPU via memory-based isolation and time-slicing. You can specify either a GPU fraction (e.g., 0.5) or an explicit GPU memory limit in MiB.

Distributed Training

For multi-GPU workloads that span multiple pods, Chamber supports two scheduling modes.

Gang Scheduling

All pods must start together or none do. No partial starts, no stragglers. This is the default behavior.
Example: You submit an 8-node distributed training workload. The cluster only has 6 GPUs free. Chamber waits until all 8 are available, then starts them simultaneously. Your training framework sees all workers from the first step.

Topology-Aware Placement

For latency-sensitive distributed workloads, Chamber co-locates pods within network topology boundaries:
  • Required placement — All pods must land within a specific topology boundary (e.g., same zone). If placement isn’t possible, the workload waits.
  • Preferred placement — Pods are placed within a tighter boundary (e.g., same rack) when possible, but the constraint is relaxed if needed.
This is useful for distributed training workloads that benefit from NVLink or high-bandwidth interconnects between nodes.
Topology-aware placement requires that your cluster nodes are labeled with the appropriate topology keys (e.g., topology.kubernetes.io/zone, rack labels). A topology definition must also be configured on the cluster. Contact your cluster administrator to ensure nodes are properly tagged before using this feature.

Workload Status Reference

StatusWhat It MeansWhat You Should Do
PendingSubmitted, waiting for allocation assignmentWait — Chamber is finding the best allocation
QueuedDispatched to cluster, waiting for GPUsWait — the scheduler will admit when capacity is available
StartingAdmitted by the scheduler, pods being createdWait — images are pulling, almost there
RunningAll pods are runningMonitor your workload
CompletedFinished successfullyCollect results
FailedSomething went wrongCheck the failure reason
PreemptedReclaimed for a higher-priority workloadResubmit — or switch to Reserved if this keeps happening
CancelledYou cancelled itNo action needed
If your elastic workloads are frequently preempted, consider increasing their priority, switching critical work to Reserved, asking your admin to raise your team’s bursting priority, or scheduling during off-peak hours when more surplus capacity is available.