Standalone Agent

The Standalone Agent is currently in beta and supports metric collection only. Features and configuration may change.

The Chamber Standalone Agent runs on individual GPU hosts (bare metal, cloud VMs, or on-prem servers) and reports GPU, host, and application-level metrics to the Chamber dashboard. It is designed for environments where GPUs are not managed by Kubernetes.

When to Use the Standalone Agent

Environment	Agent
Kubernetes cluster with GPU nodes	Kubernetes Agent
Bare-metal GPU servers	Standalone Agent
Cloud GPU VMs (GCP, AWS, Azure)	Standalone Agent
On-prem GPU workstations	Standalone Agent

What It Does

GPU Metrics

Collects utilization, memory, temperature, power draw, and profiling-level metrics from every GPU on the host via DCGM Exporter

Host Metrics

Reports CPU usage, memory usage, and capacity so you can see the full picture alongside GPU data

Workload Discovery

Automatically detects GPU processes (PyTorch, DeepSpeed, vLLM, etc.), groups distributed training workers, and tracks their lifecycle

OTLP Receiver

Accepts OpenTelemetry metrics from your applications over gRPC, forwarding them to the Chamber dashboard alongside infrastructure metrics

Architecture

The agent connects outbound to Chamber over a secure WebSocket. No inbound ports or firewall rules are required for Chamber communication.

What You’ll See in the Dashboard

Once the agent is running, navigate to Dashboard > Services (app.usechamber.io/dashboard?tab=services) to see:

GPU utilization and GPU Usage per host and per GPU — identify underutilized or overloaded GPUs
GPU memory consumption — track memory pressure across your fleet
Temperature and power draw — spot thermal throttling or power-capped GPUs before they impact training
Host CPU and memory — correlate host-level bottlenecks with GPU performance
Active workloads — see which processes are running on each GPU, how long they’ve been active, and how much GPU memory they’re consuming
Application metrics — if you instrument your app with OpenTelemetry, custom metrics appear alongside infrastructure data

Metrics are collected every 30 seconds and uploaded in batches every 60 seconds. Your host should appear in the dashboard within 60 seconds of the agent starting.

Beta Limitations

Metric collection only — the standalone agent does not support scheduling, capacity pool management, or workload orchestration
No auto-remediation — unlike the Kubernetes agent, the standalone agent does not restart or reschedule failed workloads
Features may change — configuration, metric names, and dashboard views are subject to change during the beta period

Next Steps

Installation

Install the agent on a GPU host

Metrics Reference

Understand the metrics collected and what they mean

OTLP Integration

Send application-level OpenTelemetry metrics to Chamber

Configuration

Environment variables, workload labels, upgrading, and uninstalling

Upgrading

Installation

⌘I

​When to Use the Standalone Agent

​What It Does

GPU Metrics

Host Metrics

Workload Discovery

OTLP Receiver

​Architecture

​What You’ll See in the Dashboard

​Beta Limitations

​Next Steps

Installation

Metrics Reference

OTLP Integration

Configuration

When to Use the Standalone Agent

What It Does

Architecture

What You’ll See in the Dashboard

Beta Limitations

Next Steps