Skip to main content
Coming Soon — Fully Managed GPU InfrastructureWe’re building automated infrastructure management across cloud providers so your research and MLE teams can run GPU workloads directly in their own cloud accounts without ever thinking about infrastructure. If you’d like early access or want to learn more, reach out at support@usechamber.com.

Auto-Containerize with chamber run

One command. That’s all it takes. Point chamber run at your training code and it handles everything: containerization, registry authentication, image building, and workload submission.
chamber run ./my-training-project --gpus 4 --team my-team
No Docker knowledge required. Chamber auto-detects your project structure, generates optimized Dockerfiles, and guides you through any missing prerequisites.

What Chamber Does For You

1

Detects your project

Scans for PyTorch, TensorFlow, or JAX. Finds train.py, main.py, or your entrypoint. Reads requirements.txt or pyproject.toml.
2

Generates optimized Dockerfile

Creates a GPU-optimized container with the right CUDA version, cuDNN, and your dependencies pre-installed.
3

Handles authentication

Auto-authenticates with Google Artifact Registry or AWS ECR using your existing cloud credentials.
4

Builds and pushes

Uses docker buildx build --push to build and push in a single efficient step. Automatically pulls the :latest tag to seed the layer cache so teammates get fast rebuilds. Uses content-addressed image tags — if the image already exists, build and push are skipped entirely.
5

Submits workload

Creates and submits a Kubernetes job to Chamber with your GPU and resource requirements.

Quick Start

# Preview what will be generated (recommended first step)
chamber run ./my-training-project --gpus 4 --team <team-id> --dry-run

# Build, push, and submit the workload
chamber run ./my-training-project --gpus 4 --team <team-id>
Always use --dry-run first to preview the generated Dockerfile and Kubernetes manifest before building.

Interactive Setup (First-Time Users)

Chamber guides you through everything. Missing something? Chamber will help:

No Registry Configured?

$ chamber run ./my-project --gpus 4 --team abc123

No container registry configured

Chamber needs a container registry to store your Docker images.
You can use Google Artifact Registry, AWS ECR, or any Docker-compatible registry.

Select your registry type:

  [1] Google Artifact Registry (recommended for GCP users)
      Example: us-central1-docker.pkg.dev/my-project/ml-images

  [2] AWS ECR (recommended for AWS users)
      Example: 123456789012.dkr.ecr.us-east-1.amazonaws.com/ml-images

  [3] Other Docker registry

Select an option [1]: 1

Google Artifact Registry Setup

Enter your GAR registry URL: us-central1-docker.pkg.dev/my-project/ml-images

Save as default registry? (y/n) [y]: y
Default registry saved to ~/.chamber/config.json

Docker Not Installed?

⚠ Docker is not installed
  Docker is required to build and push container images.

Installation options:

  [1] Quick install (recommended)
      brew install --cask docker
  [2] Open installation guide in browser
  [3] Show manual installation instructions
  [4] Skip (continue anyway)

Select an option [1]:

AWS CLI Not Configured?

⚠ AWS CLI is not authenticated

You need AWS credentials to push images to ECR.

Options:
  [1] Run 'aws configure' interactively (recommended)
  [2] Set environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  [3] Show me how to get credentials
  [4] Skip (I'll handle authentication myself)

Select an option [1]:
One-time setup. After your first successful run, these settings are saved. Future runs work instantly without prompts.

Supported Container Registries

Chamber automatically handles authentication for major cloud registries:
RegistryURL PatternAuto-Auth
Google Artifact Registry{region}-docker.pkg.dev/{project}/{repo}gcloud CLI
AWS ECR{account}.dkr.ecr.{region}.amazonaws.comAWS CLI
Other registriesAny Docker-compatible registryVia docker login

Project Detection

Chamber automatically detects your project configuration:
WhatHow
FrameworkScans requirements.txt for PyTorch, TensorFlow, or JAX
EntrypointLooks for train.py, main.py, run.py, or app.py
Python VersionChecks .python-version or pyproject.toml
Distributed TrainingDetects Accelerate, DeepSpeed, Ray, or Horovod
RequirementsUses requirements.txt, pyproject.toml, or setup.py

Project Configuration

Create a .chamber.yaml file for persistent settings (optional):
# .chamber.yaml
name: "llm-finetune"
entrypoint: "train.py"
entrypoint_args: "--config config.yaml --epochs 100"

# Resources
gpu_type: "H100"
gpus: 4
job_class: "ELASTIC"

# Environment variables injected into the container
env:
  WANDB_PROJECT: "my-project"
  NCCL_DEBUG: "INFO"

# Forward these local env vars to container (if set)
forward_env:
  - WANDB_API_KEY
  - HF_TOKEN

# Files to exclude from Docker build context
ignore:
  - "output/"
  - "checkpoints/"
  - "*.bin"
  - "*.safetensors"
CLI flags always override .chamber.yaml values, so you can set defaults while still customizing per-run.

Command Reference

chamber run <directory> [flags]

Required Flags

FlagDescription
--team, -tTeam ID for the workload

Resource Flags

FlagDefaultDescription
--gpus, -gFrom config or 1Number of GPUs
--gpu-typeFrom config or H100GPU type (H100, A100, L40S, etc.)
--class, -cELASTICWorkload class: RESERVED or ELASTIC

Container Flags

FlagDescription
--registryContainer registry URL (or use default_registry config)
--base-imageOverride the auto-detected base image
--dockerfileUse an existing Dockerfile instead of generating one
--no-cacheForce Docker rebuild without cache

Entrypoint Flags

FlagDescription
--name, -nWorkload name (default: from config or directory name)
--entrypointOverride detected Python entrypoint
--entrypoint-argsArguments to pass to the entrypoint
--envEnvironment variable KEY=VALUE (repeatable)

Workflow Flags

FlagDescription
--dry-runPreview generated artifacts without executing
--save-dockerfileWrite generated Dockerfile to project directory
--save-manifestWrite generated K8s manifest to project directory

Examples

Basic Training Workload

chamber run ./my-project --gpus 4 --gpu-type H100 --team abc123

Custom Entrypoint with Arguments

chamber run ./my-project \
  --gpus 8 \
  --gpu-type H100 \
  --team abc123 \
  --entrypoint train.py \
  --entrypoint-args "--config config.yaml --lr 1e-4 --batch-size 32"

With Environment Variables

chamber run ./my-project \
  --gpus 4 \
  --gpu-type H100 \
  --team abc123 \
  --env WANDB_API_KEY=$WANDB_API_KEY \
  --env HF_TOKEN=$HF_TOKEN

Reserved (Non-Preemptible) Capacity

chamber run ./my-project \
  --gpus 4 \
  --gpu-type H100 \
  --team abc123 \
  --class RESERVED

Using Your Own Dockerfile

chamber run ./my-project \
  --gpus 4 \
  --gpu-type H100 \
  --team abc123 \
  --dockerfile Dockerfile.custom

Generated Artifacts

Example Dockerfile

For a PyTorch project with Accelerate, Chamber generates:
# syntax=docker/dockerfile:1
# Auto-generated by Chamber CLI
FROM nvcr.io/nvidia/pytorch:24.04-py3

WORKDIR /workspace

COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt

COPY . .

CMD ["accelerate", "launch", "train.py", "--config", "config.yaml"]
Chamber uses BuildKit with pip cache mounts, so repeated builds reuse cached Python packages instead of re-downloading them.

Example Kubernetes Manifest

apiVersion: batch/v1
kind: Job
metadata:
  name: llm-finetune-a1b2c3d4
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: training
          image: us-central1-docker.pkg.dev/my-project/ml-images:abc123
          resources:
            requests:
              nvidia.com/gpu: "4"
            limits:
              nvidia.com/gpu: "4"
          env:
            - name: "WANDB_PROJECT"
              value: "my-project"

Troubleshooting

Chamber will prompt you with installation options:
⚠ Docker is not installed

Installation options:
  [1] Quick install (recommended)
  [2] Open installation guide in browser
  [3] Show manual installation instructions
Select an option and follow the prompts.
If auto-authentication fails:For Google Artifact Registry:
gcloud auth login
gcloud auth configure-docker {region}-docker.pkg.dev
For AWS ECR:
aws configure
# Or set environment variables:
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
Check your requirements.txt is complete. Use --dry-run to preview the Dockerfile:
chamber run ./my-project --dry-run
If needed, use --save-dockerfile to inspect and modify:
chamber run ./my-project --dry-run --save-dockerfile
# Edit Dockerfile.chamber
chamber run ./my-project --dockerfile Dockerfile.chamber

Docker Build Optimizations

Chamber automatically applies several optimizations to make builds fast:
OptimizationWhat it does
Single build+pushUses docker buildx build --push to build and push in one step. This is significantly faster than separate build+push because buildkit pushes layers directly as they complete, avoiding Docker daemon/containerd manifest duplication issues
BuildKitEnabled by default (DOCKER_BUILDKIT=1) for parallel build stages and advanced caching
Pip cache mountsUses --mount=type=cache,target=/root/.cache/pip so pip packages are cached across builds instead of re-downloaded
Remote layer cachingPulls the :latest tag from your registry before building to seed the local layer cache. After a successful build, tags as :latest using docker buildx imagetools create (fast manifest aliasing, no layer re-upload) so future builds benefit
Content-addressed tagsImages are tagged with a content hash of your project. If the image already exists in the registry, build and push are skipped entirely
Platform targetingExplicitly builds for linux/amd64 to ensure consistent images regardless of your local architecture
Reduced metadataUses --provenance=false --sbom=false to skip unnecessary metadata generation that slows down builds
Context size warningsReports build context size and warns if it exceeds 500MB, helping you identify large files that should be in .dockerignore
Streaming outputBuild output is streamed to your terminal in real-time so you can follow progress
Why single build+push matters: With Docker Desktop’s containerd image store, a separate docker push can push ALL manifests from multi-platform base images (e.g., 6 manifests for NVIDIA images), causing each layer to be checked 6 times. The buildx build --push approach only pushes what it built — making pushes dramatically faster, especially for large ML images.
Use --no-cache to force a full Docker rebuild (this does not force re-push if the image already exists):
chamber run ./my-project --gpus 4 --team abc123 --no-cache

Best Practices

Start with --dry-run

Always preview generated artifacts before building to catch issues early.

Use .chamber.yaml

Store project-specific settings to avoid repeating flags on every run.

Set default_registry

Configure your registry once: chamber config set default_registry <url>

Use forward_env

Securely inject API keys without hardcoding them in your config.