Skip to main content
The Standalone Agent is in beta. If you encounter issues not covered here, contact support@usechamber.io.

Agent Fails to Start

Check the logs for error details:
sudo journalctl -u chamber-agent-standalone -n 50 --no-pager
Common causes:
SymptomCauseFix
Authentication failed or 403Invalid or expired cluster tokenVerify CHAMBER_CLUSTER_TOKEN in /etc/chamber/agent.env. Generate a new token in Settings > API Tokens if needed.
Connection refused or timeoutFirewall blocking outbound WebSocket on port 443Ensure outbound connections to controlplane-api.usechamber.io:443 are allowed.
Organization deactivatedYour Chamber organization has been deactivatedContact support@usechamber.io.
Superseded by new instanceAnother agent instance connected with the same tokenOnly one agent instance should run per host. Check for duplicate services.

Host Not Appearing in Dashboard

  1. Verify the agent is running:
    sudo systemctl status chamber-agent-standalone
    
  2. Check for a successful WebSocket connection in the logs:
    sudo journalctl -u chamber-agent-standalone --no-pager -n 20 | grep -i websocket
    
  3. Verify the token is correct:
    grep CHAMBER_CLUSTER_TOKEN /etc/chamber/agent.env
    
    Ensure it matches a valid token from Settings > API Tokens in the Chamber dashboard.
  4. Check network connectivity:
    curl -s -o /dev/null -w "%{http_code}" https://controlplane-api.usechamber.io/health
    
    A 200 response confirms network access.

No GPU Metrics

NVIDIA Driver Not Detected

nvidia-smi
If this fails, the NVIDIA driver is not installed or not loaded. Install the driver appropriate for your GPU and OS.

DCGM Exporter Not Running

curl -s http://localhost:9400/metrics | head -20
If this returns nothing or errors, DCGM Exporter is not running. Check its status:
# If running as a Docker container
docker ps | grep dcgm

# If running as a systemd service
sudo systemctl status nvidia-dcgm-exporter
Restart if needed, or re-run the installer which will set up DCGM Exporter automatically.

DCGM Exporter on a Non-Default Port

If DCGM Exporter is running on a port other than 9400, update the agent configuration:
# In /etc/chamber/agent.env
CHAMBER_DCGM_URL=http://localhost:<your-port>/metrics
Then restart the agent: sudo systemctl restart chamber-agent-standalone

GPU Usage Metric Not Available

The GPU Usage metric requires DCGM profiling metrics. If it’s missing:
  1. Check profiling metrics are exported:
    curl -s http://localhost:9400/metrics | grep DCGM_FI_PROF
    
    You should see DCGM_FI_PROF_SM_ACTIVE, DCGM_FI_PROF_PIPE_TENSOR_ACTIVE, and DCGM_FI_PROF_DRAM_ACTIVE.
  2. If absent, add profiling fields to your DCGM Exporter counters file. See Ensuring DCGM Profiling Metrics Are Collected.
  3. Check GPU compatibility: Profiling requires Volta-generation GPUs or newer (V100, T4, A100, H100, B200, etc.).
  4. Check for conflicting profiling sessions: Another tool (e.g., Nsight Systems, Nsight Compute) holding a profiling session prevents DCGM from collecting profiling metrics. Close the conflicting tool and restart DCGM Exporter.

OTLP Receiver Issues

Application Can’t Connect to Port 4317

  1. Verify the OTLP receiver is enabled:
    grep CHAMBER_OTLP_GRPC_ENABLED /etc/chamber/agent.env
    
    It should be true (or absent, since true is the default).
  2. Check the agent is listening:
    ss -tlnp | grep 4317
    
  3. If another process is using port 4317, change the agent’s OTLP port:
    # In /etc/chamber/agent.env
    CHAMBER_OTLP_GRPC_PORT=4318
    
    Then restart the agent and update your application’s exporter endpoint.

Metrics Sent but Not Appearing in Dashboard

  • Metrics are batched and uploaded every 60 seconds. Wait at least 2 minutes after sending.
  • Check agent logs for OTLP-related errors:
    sudo journalctl -u chamber-agent-standalone -f | grep -i otlp
    
  • Verify your application sets service.name in its resource attributes. Metrics without a service name may be harder to find in the dashboard.

Agent Keeps Restarting

Check for repeated crash logs:
sudo journalctl -u chamber-agent-standalone --no-pager -n 100
Common causes:
  • Python version too old: The agent requires Python 3.11+. Check with /opt/chamber-standalone/venv/bin/python --version.
  • Corrupted installation: Re-run the installer to repair: curl -fsSL https://chamber-agent-releases.s3.amazonaws.com/install.sh | sudo bash

Health and Liveness Probes

The agent provides CLI commands for health and liveness checks, useful for container orchestration or custom monitoring:
# Readiness: succeeds when the agent is connected and healthy
chamber-standalone health

# Liveness: succeeds when the agent process is alive and responsive
chamber-standalone liveness
Both commands check for a recent heartbeat file (within the last 120 seconds) and exit with code 0 (healthy) or 1 (unhealthy). The systemd service uses the liveness probe as a watchdog automatically. If you’re running the agent in a container, configure probes like:
livenessProbe:
  exec:
    command: ["chamber-standalone", "liveness"]
  periodSeconds: 30
readinessProbe:
  exec:
    command: ["chamber-standalone", "health"]
  periodSeconds: 30

Support

If these steps don’t resolve your issue, contact support@usechamber.io with:
  • Agent logs: sudo journalctl -u chamber-agent-standalone --no-pager -n 200
  • Agent version: /opt/chamber-standalone/venv/bin/chamber-standalone version
  • Host info: /opt/chamber-standalone/venv/bin/chamber-standalone host-info
  • GPU info: nvidia-smi