Skip to main content
The Standalone Agent is in beta. Configuration options may change.
The agent is configured via environment variables in /etc/chamber/agent.env. After making changes, restart the agent:
sudo vi /etc/chamber/agent.env
sudo systemctl restart chamber-agent-standalone

Required Settings

VariableDescription
CHAMBER_CLUSTER_TOKENAPI token for authenticating with the Chamber control plane. Generated in Settings > API Tokens.
The CHAMBER_SAAS_URL is set automatically by the installer and should not need to be changed.

Common Settings

VariableDefaultDescription
CHAMBER_CLUSTER_IDhostnameIdentifier for this host in the Chamber dashboard. Set this to a human-readable name if the hostname is opaque (e.g., ip-10-0-1-42).
CHAMBER_HOSTNAMEhostnameOverride the reported hostname.
CHAMBER_DCGM_URLhttp://localhost:9400/metricsDCGM Exporter endpoint. Change if running DCGM Exporter on a non-default port.

Metrics Settings

VariableDefaultDescription
CHAMBER_METRICS_ENABLEDtrueEnable or disable all metric collection.
CHAMBER_METRICS_COLLECTION_INTERVAL30Seconds between metric collection cycles.
CHAMBER_METRICS_BATCH_INTERVAL60Seconds between batch uploads to Chamber.
CHAMBER_METRICS_BUFFER_SIZE100000Maximum metrics buffered before oldest are dropped.

GPU Usage Settings

VariableDefaultDescription
CHAMBER_GPU_USAGE_ENABLEDtrueEnable or disable the composite GPU Usage metric.
CHAMBER_GPU_USAGE_POWER_THRESHOLD0.1Power ratio (0.0–1.0) below which GPU is considered idle. At the default, GPUs drawing less than 10% of their power limit report GPU Usage as 0.

Discovery Settings

VariableDefaultDescription
CHAMBER_DISCOVERY_ENABLEDtrueEnable or disable automatic GPU workload discovery.
CHAMBER_DISCOVERY_SCAN_INTERVAL15Seconds between GPU process scans.
CHAMBER_DISCOVERY_STABILIZATION_DELAY5Seconds to wait before classifying a newly detected process. Prevents transient processes from appearing as workloads.
CHAMBER_DISCOVERY_EVENT_BUFFER_SIZE500Maximum discovery events buffered.

OTLP Settings

VariableDefaultDescription
CHAMBER_OTLP_GRPC_ENABLEDtrueEnable or disable the OTLP gRPC receiver.
CHAMBER_OTLP_GRPC_PORT4317gRPC server port.
CHAMBER_OTLP_GRPC_HOST0.0.0.0gRPC server bind address.
CHAMBER_OTLP_METRICS_QUEUE_SIZE10000Maximum queued metric batches.

Connection Settings

VariableDefaultDescription
CHAMBER_HEARTBEAT_INTERVAL30Seconds between WebSocket heartbeats.
CHAMBER_METADATA_INTERVAL60Seconds between host metadata reports.
CHAMBER_RECONNECT_BACKOFF5Initial reconnection delay (seconds) after a dropped WebSocket connection.
CHAMBER_MAX_RECONNECT_BACKOFF30Maximum reconnection delay (seconds). The agent uses exponential backoff with jitter between these bounds.

Workload Labels

Add environment variables to your training jobs to enrich workload metadata in the dashboard:
CHAMBER_JOB_NAME="llama2-finetune-v3" \
CHAMBER_TEAM_NAME="ml-platform" \
  torchrun --nproc_per_node=4 train.py
VariableDescription
CHAMBER_JOB_NAMEOverride the auto-detected workload name. Useful when the process command line is generic.
CHAMBER_TEAM_NAMESet a team label for workload attribution. Groups workloads by team in the dashboard.
These environment variables are read from the process environment at discovery time. They are not part of /etc/chamber/agent.env — set them on your training process directly.

Upgrading

Re-run the installer to upgrade to the latest version. Your configuration in /etc/chamber/agent.env is preserved:
curl -fsSL https://chamber-agent-releases.s3.amazonaws.com/install.sh | sudo bash
To pin to a specific version (e.g., for reproducibility):
curl -fsSL https://chamber-agent-releases.s3.amazonaws.com/install.sh \
  | sudo bash -s -- --version <VERSION>

Uninstalling

sudo systemctl stop chamber-agent-standalone
sudo systemctl disable chamber-agent-standalone
sudo rm /etc/systemd/system/chamber-agent-standalone.service
sudo systemctl daemon-reload
sudo rm -rf /opt/chamber-standalone
sudo rm -rf /etc/chamber
sudo userdel chamber-agent
This removes the agent, its configuration, and the system user. It does not remove Docker, Python, or DCGM Exporter.

Next Steps

Troubleshooting

Common issues and solutions

Metrics Reference

Full list of metrics collected