Skip to main content
The Standalone Agent is currently in beta and supports metric collection only. Features and configuration may change.
The Chamber Standalone Agent runs on individual GPU hosts (bare metal, cloud VMs, or on-prem servers) and reports GPU, host, and application-level metrics to the Chamber dashboard. It is designed for environments where GPUs are not managed by Kubernetes.

When to Use the Standalone Agent

EnvironmentAgent
Kubernetes cluster with GPU nodesKubernetes Agent
Bare-metal GPU serversStandalone Agent
Cloud GPU VMs (GCP, AWS, Azure)Standalone Agent
On-prem GPU workstationsStandalone Agent

What It Does

GPU Metrics

Collects utilization, memory, temperature, power draw, and profiling-level metrics from every GPU on the host via DCGM Exporter

Host Metrics

Reports CPU usage, memory usage, and capacity so you can see the full picture alongside GPU data

Workload Discovery

Automatically detects GPU processes (PyTorch, DeepSpeed, vLLM, etc.), groups distributed training workers, and tracks their lifecycle

OTLP Receiver

Accepts OpenTelemetry metrics from your applications over gRPC, forwarding them to the Chamber dashboard alongside infrastructure metrics

Architecture

The agent connects outbound to Chamber over a secure WebSocket. No inbound ports or firewall rules are required for Chamber communication.

What You’ll See in the Dashboard

Once the agent is running, navigate to Dashboard > Services (app.usechamber.io/dashboard?tab=services) to see:
  • GPU utilization and GPU Usage per host and per GPU — identify underutilized or overloaded GPUs
  • GPU memory consumption — track memory pressure across your fleet
  • Temperature and power draw — spot thermal throttling or power-capped GPUs before they impact training
  • Host CPU and memory — correlate host-level bottlenecks with GPU performance
  • Active workloads — see which processes are running on each GPU, how long they’ve been active, and how much GPU memory they’re consuming
  • Application metrics — if you instrument your app with OpenTelemetry, custom metrics appear alongside infrastructure data
Metrics are collected every 30 seconds and uploaded in batches every 60 seconds. Your host should appear in the dashboard within 60 seconds of the agent starting.

Beta Limitations

  • Metric collection only — the standalone agent does not support scheduling, capacity pool management, or workload orchestration
  • No auto-remediation — unlike the Kubernetes agent, the standalone agent does not restart or reschedule failed workloads
  • Features may change — configuration, metric names, and dashboard views are subject to change during the beta period

Next Steps

Installation

Install the agent on a GPU host

Metrics Reference

Understand the metrics collected and what they mean

OTLP Integration

Send application-level OpenTelemetry metrics to Chamber

Configuration

Environment variables, workload labels, upgrading, and uninstalling