The Standalone Agent is in beta . Configuration options may change.
The agent is configured via environment variables in /etc/chamber/agent.env. After making changes, restart the agent:
sudo vi /etc/chamber/agent.env
sudo systemctl restart chamber-agent-standalone
Required Settings
Variable Description CHAMBER_CLUSTER_TOKENAPI token for authenticating with the Chamber control plane. Generated in Settings > API Tokens .
The CHAMBER_SAAS_URL is set automatically by the installer and should not need to be changed.
Common Settings
Variable Default Description CHAMBER_CLUSTER_IDhostname Identifier for this host in the Chamber dashboard. Set this to a human-readable name if the hostname is opaque (e.g., ip-10-0-1-42). CHAMBER_HOSTNAMEhostname Override the reported hostname. CHAMBER_DCGM_URLhttp://localhost:9400/metricsDCGM Exporter endpoint. Change if running DCGM Exporter on a non-default port.
Metrics Settings
Variable Default Description CHAMBER_METRICS_ENABLEDtrueEnable or disable all metric collection. CHAMBER_METRICS_COLLECTION_INTERVAL30Seconds between metric collection cycles. CHAMBER_METRICS_BATCH_INTERVAL60Seconds between batch uploads to Chamber. CHAMBER_METRICS_BUFFER_SIZE100000Maximum metrics buffered before oldest are dropped.
GPU Usage Settings
Variable Default Description CHAMBER_GPU_USAGE_ENABLEDtrueEnable or disable the composite GPU Usage metric. CHAMBER_GPU_USAGE_POWER_THRESHOLD0.1Power ratio (0.0–1.0) below which GPU is considered idle. At the default, GPUs drawing less than 10% of their power limit report GPU Usage as 0.
Discovery Settings
Variable Default Description CHAMBER_DISCOVERY_ENABLEDtrueEnable or disable automatic GPU workload discovery. CHAMBER_DISCOVERY_SCAN_INTERVAL15Seconds between GPU process scans. CHAMBER_DISCOVERY_STABILIZATION_DELAY5Seconds to wait before classifying a newly detected process. Prevents transient processes from appearing as workloads. CHAMBER_DISCOVERY_EVENT_BUFFER_SIZE500Maximum discovery events buffered.
OTLP Settings
Variable Default Description CHAMBER_OTLP_GRPC_ENABLEDtrueEnable or disable the OTLP gRPC receiver. CHAMBER_OTLP_GRPC_PORT4317gRPC server port. CHAMBER_OTLP_GRPC_HOST0.0.0.0gRPC server bind address. CHAMBER_OTLP_METRICS_QUEUE_SIZE10000Maximum queued metric batches.
Connection Settings
Variable Default Description CHAMBER_HEARTBEAT_INTERVAL30Seconds between WebSocket heartbeats. CHAMBER_METADATA_INTERVAL60Seconds between host metadata reports. CHAMBER_RECONNECT_BACKOFF5Initial reconnection delay (seconds) after a dropped WebSocket connection. CHAMBER_MAX_RECONNECT_BACKOFF30Maximum reconnection delay (seconds). The agent uses exponential backoff with jitter between these bounds.
Workload Labels
Add environment variables to your training jobs to enrich workload metadata in the dashboard:
CHAMBER_JOB_NAME = "llama2-finetune-v3" \
CHAMBER_TEAM_NAME= "ml-platform" \
torchrun --nproc_per_node=4 train.py
Variable Description CHAMBER_JOB_NAMEOverride the auto-detected workload name. Useful when the process command line is generic. CHAMBER_TEAM_NAMESet a team label for workload attribution. Groups workloads by team in the dashboard.
These environment variables are read from the process environment at discovery time. They are not part of /etc/chamber/agent.env — set them on your training process directly.
Upgrading
Re-run the installer to upgrade to the latest version. Your configuration in /etc/chamber/agent.env is preserved:
curl -fsSL https://chamber-agent-releases.s3.amazonaws.com/install.sh | sudo bash
To pin to a specific version (e.g., for reproducibility):
curl -fsSL https://chamber-agent-releases.s3.amazonaws.com/install.sh \
| sudo bash -s -- --version < VERSIO N >
Uninstalling
sudo systemctl stop chamber-agent-standalone
sudo systemctl disable chamber-agent-standalone
sudo rm /etc/systemd/system/chamber-agent-standalone.service
sudo systemctl daemon-reload
sudo rm -rf /opt/chamber-standalone
sudo rm -rf /etc/chamber
sudo userdel chamber-agent
This removes the agent, its configuration, and the system user. It does not remove Docker, Python, or DCGM Exporter.
Next Steps
Troubleshooting Common issues and solutions
Metrics Reference Full list of metrics collected