Agent Fails to Start
Check the logs for error details:| Symptom | Cause | Fix |
|---|---|---|
Authentication failed or 403 | Invalid or expired cluster token | Verify CHAMBER_CLUSTER_TOKEN in /etc/chamber/agent.env. Generate a new token in Settings > API Tokens if needed. |
Connection refused or timeout | Firewall blocking outbound WebSocket on port 443 | Ensure outbound connections to controlplane-api.usechamber.io:443 are allowed. |
Organization deactivated | Your Chamber organization has been deactivated | Contact support@usechamber.io. |
Superseded by new instance | Another agent instance connected with the same token | Only one agent instance should run per host. Check for duplicate services. |
Host Not Appearing in Dashboard
-
Verify the agent is running:
-
Check for a successful WebSocket connection in the logs:
-
Verify the token is correct:
Ensure it matches a valid token from Settings > API Tokens in the Chamber dashboard.
-
Check network connectivity:
A
200response confirms network access.
No GPU Metrics
NVIDIA Driver Not Detected
DCGM Exporter Not Running
DCGM Exporter on a Non-Default Port
If DCGM Exporter is running on a port other than 9400, update the agent configuration:sudo systemctl restart chamber-agent-standalone
GPU Usage Metric Not Available
The GPU Usage metric requires DCGM profiling metrics. If it’s missing:-
Check profiling metrics are exported:
You should see
DCGM_FI_PROF_SM_ACTIVE,DCGM_FI_PROF_PIPE_TENSOR_ACTIVE, andDCGM_FI_PROF_DRAM_ACTIVE. - If absent, add profiling fields to your DCGM Exporter counters file. See Ensuring DCGM Profiling Metrics Are Collected.
- Check GPU compatibility: Profiling requires Volta-generation GPUs or newer (V100, T4, A100, H100, B200, etc.).
- Check for conflicting profiling sessions: Another tool (e.g., Nsight Systems, Nsight Compute) holding a profiling session prevents DCGM from collecting profiling metrics. Close the conflicting tool and restart DCGM Exporter.
OTLP Receiver Issues
Application Can’t Connect to Port 4317
-
Verify the OTLP receiver is enabled:
It should be
true(or absent, sincetrueis the default). -
Check the agent is listening:
-
If another process is using port 4317, change the agent’s OTLP port:
Then restart the agent and update your application’s exporter endpoint.
Metrics Sent but Not Appearing in Dashboard
- Metrics are batched and uploaded every 60 seconds. Wait at least 2 minutes after sending.
- Check agent logs for OTLP-related errors:
- Verify your application sets
service.namein its resource attributes. Metrics without a service name may be harder to find in the dashboard.
Agent Keeps Restarting
Check for repeated crash logs:- Python version too old: The agent requires Python 3.11+. Check with
/opt/chamber-standalone/venv/bin/python --version. - Corrupted installation: Re-run the installer to repair:
curl -fsSL https://chamber-agent-releases.s3.amazonaws.com/install.sh | sudo bash
Health and Liveness Probes
The agent provides CLI commands for health and liveness checks, useful for container orchestration or custom monitoring:Support
If these steps don’t resolve your issue, contact support@usechamber.io with:- Agent logs:
sudo journalctl -u chamber-agent-standalone --no-pager -n 200 - Agent version:
/opt/chamber-standalone/venv/bin/chamber-standalone version - Host info:
/opt/chamber-standalone/venv/bin/chamber-standalone host-info - GPU info:
nvidia-smi

