Scheduling - Chamber

Chamber’s scheduling engine decides where and when your GPU workloads run — enforcing capacity guarantees, hierarchical bursting, and preemption automatically.

How a Workload Gets Scheduled

Submit

You submit a workload via the API, CLI, or dashboard. Chamber records it and marks it Pending.

Allocate

The scheduler assigns an allocation — picking the best fit based on your team’s reservations and available capacity. The workload moves to Queued.

Dispatch

The workload is dispatched to the Chamber Agent running in your Kubernetes cluster.

Admit

The in-cluster scheduler admits the workload based on your team’s quota and current demand. When GPUs are available, it places the pods. The workload moves to Starting, then Running.

Report

The agent reports status back — you see real-time updates as your workload progresses through to Completed, or is Preempted / Failed if something changes.

Reserved vs Elastic Workloads

Every workload you submit is either Reserved or Elastic. This determines how it accesses GPU capacity and whether it can be preempted.

	Reserved	Elastic
What it means	Guaranteed capacity backed by your team’s reservation	Uses spare GPUs when available
Can it be preempted?	Never	Yes — when reserved work needs the GPUs
Priority level	Non-preemptible (priority ≥ 100)	Preemptible (priority < 100)
Best for	Production inference, critical training runs	Experiments, hyperparameter sweeps, batch jobs

Use Reserved for work that can’t be interrupted. Use Elastic to take advantage of idle GPUs across the cluster — you get free capacity, but accept the trade-off that it can be reclaimed.

How Scheduling Decisions Are Made

When you submit a workload:

Chamber picks the best allocation for your workload based on available capacity across your team’s reservations.
Reserved workloads are admitted first. If your team’s allocation has room, the workload starts. Reserved capacity is guaranteed and protected.
Elastic workloads use surplus capacity. They run on idle GPUs — either from your team’s unused reservation or from spare capacity elsewhere in the pool. Elastic workloads can be reclaimed at any time when reserved work needs the GPUs.
The scheduler evaluates continuously. As workloads complete and GPUs free up, waiting workloads are admitted automatically.

Key Terms

Term	What It Means
Bursting	When a team’s elastic workloads use GPUs beyond their reserved allocation. Your team “bursts” into idle capacity from the pool.
Burst limit	The maximum amount a team can burst beyond its reservation. For example, a 50% burst limit on a 32-GPU reservation means the team can use up to 48 GPUs total (32 reserved + 16 burst). Set to 0 to disable bursting entirely.
Bursting priority	A weight (1–10) that controls how surplus GPUs are divided when multiple teams are bursting at the same time. Higher priority means a larger share of the available surplus.
Preemption	When the scheduler stops an elastic workload to free GPUs for higher-priority work. Only elastic workloads can be preempted — reserved workloads are always protected.

Hierarchical Bursting

Your organization’s teams are arranged in a tree. Chamber respects this hierarchy when distributing surplus capacity:

Acme Corp
├── Research (bursting priority: 3)
│   ├── NLP Team (bursting priority: 2)
│   └── Vision Team (bursting priority: 1)
└── Production (bursting priority: 2)
    ├── Inference (bursting priority: 1)
    └── Training (bursting priority: 1)

Each team has a bursting priority that controls its share of surplus GPUs when multiple teams are bursting.
Surplus capacity flows down the tree proportionally — Research gets 60% of excess (priority 3 out of 5), Production gets 40% (priority 2 out of 5).
Within Research, NLP gets twice the surplus share of Vision (priority 2 vs 1).
Teams don’t compete in a flat pool. The hierarchy ensures organizational priorities are respected even when capacity is tight.
Each team can also have a burst limit that caps how far beyond its reservation it can go — preventing any single team from consuming the entire pool.

Example: Hierarchical Bursting in Action

A pool has 100 GPUs. Research has 48 reserved, Production has 32 reserved. Research is only using 20.

Team	Reservation	In Use	Idle	What Happens
Research	48	20	28	28 idle GPUs available for bursting
Production	32	32	0	Elastic workloads can burst into Research’s idle capacity
Pool total	100	52	—	48 surplus GPUs distributed by bursting priority

Production’s elastic workloads can temporarily burst into Research’s idle GPUs. But the moment Research submits reserved work that needs those GPUs back, Production’s elastic workloads are preempted.

Preemption

Preemption is how Chamber reclaims GPUs for higher-priority work. It’s designed to be predictable and minimize disruption.

Rules

Reserved workloads are never preempted. The scheduler will not reclaim them under any circumstances.
Elastic workloads are always reclaimable. The scheduler can reclaim their resources when reserved workloads need capacity.
Higher-priority elastic workloads displace lower-priority ones. Among elastic workloads, priority determines who stays when the pool is full.
Cheapest workloads are reclaimed first. The scheduler picks victims in this order:
- Queued workloads (haven’t started — zero cost)
- Starting workloads (pods being created — minimal cost)
- Running workloads, lowest priority first (most expensive, last resort)
Only enough workloads are reclaimed to free the capacity needed — no more.

Preemption Examples

Reserved reclaims from elastic

NLP Team has a 32-GPU reservation. They’re running 20 GPUs of reserved inference and 12 GPUs of elastic experiments — fully utilizing their allocation. A researcher submits a 12-GPU reserved training workload.The 12 elastic experiments are preempted to make room. Reserved always wins over elastic within the same allocation.

High-priority elastic displaces low-priority elastic

The pool is fully utilized. Team A is running an 8-GPU elastic sweep at priority 50. Team B submits an 8-GPU elastic workload at priority 75.Both are elastic, but priority determines who stays. The lower-priority sweep is reclaimed.

No preemption needed

Vision Team has a 16-GPU reservation but is only using 4 GPUs. A researcher submits an 8-GPU elastic workload.Plenty of idle capacity — the elastic workload starts immediately with no preemption. If reserved work later needs those GPUs, this elastic workload would be the first to go.

Cross-team reclaim

Research has 48 GPUs reserved but is only using 20. Production’s elastic workloads have expanded into 28 of Research’s idle GPUs. A Research team member submits a 24-GPU reserved workload.Research’s reservation can support the new workload (48 − 20 = 28 free slots ≥ 24 needed). The scheduler preempts 24 of Production’s 28 elastic workloads — just enough to free the physical GPUs. Production’s remaining 4 elastic workloads and all reserved workloads are untouched.

Fractional GPUs

Not every workload needs a full GPU. You can request fractions in 0.25 increments:

Request	Use Case
`0.25` GPU	Lightweight inference, notebooks, debugging
`0.5` GPU	Moderate inference, small model fine-tuning
`0.75` GPU	Heavier single-GPU tasks
`1.0+` GPU	Standard training and inference

Multiple fractional workloads share a single physical GPU via memory-based isolation and time-slicing. You can specify either a GPU fraction (e.g., 0.5) or an explicit GPU memory limit in MiB.

Distributed Training

For multi-GPU workloads that span multiple pods, Chamber supports two scheduling modes.

Gang Scheduling

All pods must start together or none do. No partial starts, no stragglers. This is the default behavior.

Example: You submit an 8-node distributed training workload. The cluster only has 6 GPUs free. Chamber waits until all 8 are available, then starts them simultaneously. Your training framework sees all workers from the first step.

Topology-Aware Placement

For latency-sensitive distributed workloads, Chamber co-locates pods within network topology boundaries:

Required placement — All pods must land within a specific topology boundary (e.g., same zone). If placement isn’t possible, the workload waits.
Preferred placement — Pods are placed within a tighter boundary (e.g., same rack) when possible, but the constraint is relaxed if needed.

This is useful for distributed training workloads that benefit from NVLink or high-bandwidth interconnects between nodes.

Topology-aware placement requires that your cluster nodes are labeled with the appropriate topology keys (e.g., topology.kubernetes.io/zone, rack labels). A topology definition must also be configured on the cluster. Contact your cluster administrator to ensure nodes are properly tagged before using this feature.

Workload Status Reference

Status	What It Means	What You Should Do
Pending	Submitted, waiting for allocation assignment	Wait — Chamber is finding the best allocation
Queued	Dispatched to cluster, waiting for GPUs	Wait — the scheduler will admit when capacity is available
Starting	Admitted by the scheduler, pods being created	Wait — images are pulling, almost there
Running	All pods are running	Monitor your workload
Completed	Finished successfully	Collect results
Failed	Something went wrong	Check the failure reason
Preempted	Reclaimed for a higher-priority workload	Resubmit — or switch to Reserved if this keeps happening
Cancelled	You cancelled it	No action needed

If your elastic workloads are frequently preempted, consider increasing their priority, switching critical work to Reserved, asking your admin to raise your team’s bursting priority, or scheduling during off-peak hours when more surplus capacity is available.

Workload Classes

Reserved vs elastic workload types

Reservations

How capacity is allocated to teams

Teams

Organizational hierarchy and bursting priorities

Capacity Pools

GPU resource pools representing your clusters

​How a Workload Gets Scheduled

​Reserved vs Elastic Workloads

​How Scheduling Decisions Are Made

​Key Terms

​Hierarchical Bursting

​Example: Hierarchical Bursting in Action

​Preemption

​Rules

​Preemption Examples

​Fractional GPUs

​Distributed Training

​Gang Scheduling

​Topology-Aware Placement

​Workload Status Reference

​Related Concepts

Workload Classes

Reservations

Teams

Capacity Pools

How a Workload Gets Scheduled

Reserved vs Elastic Workloads

How Scheduling Decisions Are Made

Key Terms

Hierarchical Bursting

Example: Hierarchical Bursting in Action

Preemption

Rules

Preemption Examples

Fractional GPUs

Distributed Training

Gang Scheduling

Topology-Aware Placement

Workload Status Reference

Related Concepts