When to Use the Standalone Agent
| Environment | Agent |
|---|---|
| Kubernetes cluster with GPU nodes | Kubernetes Agent |
| Bare-metal GPU servers | Standalone Agent |
| Cloud GPU VMs (GCP, AWS, Azure) | Standalone Agent |
| On-prem GPU workstations | Standalone Agent |
What It Does
GPU Metrics
Collects utilization, memory, temperature, power draw, and profiling-level metrics from every GPU on the host via DCGM Exporter
Host Metrics
Reports CPU usage, memory usage, and capacity so you can see the full picture alongside GPU data
Workload Discovery
Automatically detects GPU processes (PyTorch, DeepSpeed, vLLM, etc.), groups distributed training workers, and tracks their lifecycle
OTLP Receiver
Accepts OpenTelemetry metrics from your applications over gRPC, forwarding them to the Chamber dashboard alongside infrastructure metrics
Architecture
The agent connects outbound to Chamber over a secure WebSocket. No inbound ports or firewall rules are required for Chamber communication.What You’ll See in the Dashboard
Once the agent is running, navigate to Dashboard > Services (app.usechamber.io/dashboard?tab=services) to see:- GPU utilization and GPU Usage per host and per GPU — identify underutilized or overloaded GPUs
- GPU memory consumption — track memory pressure across your fleet
- Temperature and power draw — spot thermal throttling or power-capped GPUs before they impact training
- Host CPU and memory — correlate host-level bottlenecks with GPU performance
- Active workloads — see which processes are running on each GPU, how long they’ve been active, and how much GPU memory they’re consuming
- Application metrics — if you instrument your app with OpenTelemetry, custom metrics appear alongside infrastructure data
Metrics are collected every 30 seconds and uploaded in batches every 60 seconds. Your host should appear in the dashboard within 60 seconds of the agent starting.
Beta Limitations
- Metric collection only — the standalone agent does not support scheduling, capacity pool management, or workload orchestration
- No auto-remediation — unlike the Kubernetes agent, the standalone agent does not restart or reschedule failed workloads
- Features may change — configuration, metric names, and dashboard views are subject to change during the beta period
Next Steps
Installation
Install the agent on a GPU host
Metrics Reference
Understand the metrics collected and what they mean
OTLP Integration
Send application-level OpenTelemetry metrics to Chamber
Configuration
Environment variables, workload labels, upgrading, and uninstalling

