AWS EKS - Chamber

The terraform-aws-chamber-eks module deploys a production-ready Amazon EKS cluster with GPU autoscaling, NVIDIA drivers, and the Chamber Agent — all in a single terraform apply.

Prerequisites

Terraform >= 1.3.0

Install from developer.hashicorp.com/terraform/install. Verify with terraform version.

AWS CLI configured

The AWS provider uses your local credentials. Verify with aws sts get-caller-identity.

Chamber Console account

You need a cluster token and cluster ID from the Chamber Console. See Getting a Cluster Token for instructions.

Quick Start

Create main.tf

Create a new directory for your Terraform configuration and add a main.tf file:

provider "aws" {
  region = var.aws_region
}

module "chamber_eks" {
  source = "github.com/ChamberOrg/terraform-aws-chamber-eks"

  cluster_name          = "my-gpu-cluster"
  aws_region            = var.aws_region
  chamber_cluster_token = var.chamber_cluster_token
  chamber_cluster_id    = var.chamber_cluster_id
}

variable "aws_region" {
  type = string
}

variable "chamber_cluster_token" {
  type      = string
  sensitive = true
}

variable "chamber_cluster_id" {
  type = string
}

output "configure_kubectl" {
  value = module.chamber_eks.configure_kubectl
}

Create terraform.tfvars

aws_region            = "us-west-2"
chamber_cluster_token = "your-token-here"
chamber_cluster_id    = "your-cluster-id"

Do not commit terraform.tfvars to version control. Add it to .gitignore. For CI/CD pipelines, use environment variables: TF_VAR_chamber_cluster_token.

Deploy

terraform init
terraform plan
terraform apply

Deployment takes approximately 15-20 minutes.

Configure kubectl

$(terraform output -raw configure_kubectl)

Verify

# Verify system nodes are ready
kubectl get nodes

# Verify Karpenter is running
kubectl get pods -n karpenter

# Verify Chamber Agent is connected
kubectl get pods -n chamber-system

Your cluster should appear in the Chamber Console under Capacity Pools.

Using an Existing VPC

To deploy into an existing VPC instead of creating a new one:

module "chamber_eks" {
  source = "github.com/ChamberOrg/terraform-aws-chamber-eks"

  cluster_name          = "my-gpu-cluster"
  aws_region            = var.aws_region
  chamber_cluster_token = var.chamber_cluster_token
  chamber_cluster_id    = var.chamber_cluster_id

  create_vpc         = false
  vpc_id             = "vpc-xxxxxxxx"
  private_subnet_ids = ["subnet-aaa", "subnet-bbb", "subnet-ccc"]
}

Private subnets must have outbound internet access (via NAT Gateway) for nodes to pull container images and connect to the Chamber control plane.

Key Variables

The table below covers the most commonly configured variables. For the complete list, see the module README on GitHub.

Required

Variable	Description	Type
`cluster_name`	Name of the EKS cluster	`string`
`chamber_cluster_token`	Cluster token from Chamber Console	`string`
`chamber_cluster_id`	Cluster ID from Chamber Console	`string`

AWS

Variable	Description	Default
`aws_region`	AWS region for the EKS cluster	`"us-west-2"`

VPC

Variable	Description	Default
`create_vpc`	Create a new VPC or use existing	`true`
`vpc_id`	Existing VPC ID (required when `create_vpc = false`)	`null`
`private_subnet_ids`	Existing private subnet IDs (required when `create_vpc = false`)	`[]`
`vpc_cidr`	CIDR block for new VPC	`"10.0.0.0/16"`
`single_nat_gateway`	Use a single NAT gateway (cost savings for dev/staging)	`false`

EKS

Variable	Description	Default
`cluster_version`	Kubernetes version	`"1.32"`
`system_node_instance_types`	Instance types for system node group	`["m5.large", "m5a.large", "m6i.large"]`

GPU

Variable	Description	Default
`create_default_gpu_nodepool`	Create Terraform-managed GPU NodePool	`false`
`gpu_instance_families`	GPU instance families for NodePool	`["g5", "g6", "p4d", "p5"]`
`capacity_types`	Capacity types (`on-demand`, `spot`)	`["on-demand", "spot"]`
`gpu_limits`	Maximum GPUs for NodePool	`100`

Chamber

Variable	Description	Default
`chamber_agent_version`	Chamber Agent version	`"latest"`
`enable_kai_scheduler`	Enable KAI fractional GPU scheduler	`true`

Key Outputs

Output	Description
`cluster_name`	Name of the EKS cluster
`cluster_endpoint`	EKS API server endpoint
`vpc_id`	VPC ID
`configure_kubectl`	Command to configure kubectl
`verification_commands`	Commands to verify the deployment
`karpenter_node_role_arn`	Karpenter node IAM role ARN

For all outputs, see the module README on GitHub.

GPU Pool Management

After deployment, you need GPU pools for Karpenter to know which GPU nodes to provision. There are two approaches:

Console-managed (default)
Terraform-managed

Manage GPU pools through the Chamber Console:

Go to Capacity Pools > Create Dynamic Pool
Select your cluster and configure GPU type, limits, and capacity types
The pool syncs to your cluster automatically — Karpenter provisions GPU nodes on demand

This is the recommended approach for most teams. It allows per-GPU-type management with real-time limit adjustments.

To manage GPU pools entirely through Terraform, set create_default_gpu_nodepool = true:

module "chamber_eks" {
  source = "github.com/ChamberOrg/terraform-aws-chamber-eks"

  # ... required variables ...

  create_default_gpu_nodepool = true
  gpu_instance_families       = ["g5", "g6"]
  capacity_types              = ["on-demand", "spot"]
  gpu_limits                  = 50
}

This creates a single broad NodePool covering the specified instance families.

Troubleshooting

Cluster not appearing in Chamber Console

Check the Chamber Agent logs:

kubectl logs -n chamber-system -l app=chamber-agent --tail=100

Verify your chamber_cluster_token and chamber_cluster_id are correct.

GPU nodes not provisioning

Verify a GPU pool exists:
```
kubectl get nodepool
```
If none exists, create one via the Chamber Console or set create_default_gpu_nodepool = true.

Check Karpenter logs:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100

Check for capacity errors:

kubectl get events --field-selector reason=FailedProvisioning

MaxSpotInstanceCountExceeded

Your AWS account lacks GPU Spot quota. Either request a quota increase via the AWS Service Quotas console, or set capacity_types = ["on-demand"].

GPU Operator pods stuck in Pending

This is expected when no GPU nodes exist yet. GPU Operator DaemonSets start automatically when Karpenter provisions GPU nodes in response to a workload.

Cleanup

terraform destroy

Ensure all GPU workloads are terminated before destroying to avoid orphaned resources.

Next Steps

Quickstart

Submit your first GPU workload

Capacity Management

Configure capacity pools and reservations

Agent Troubleshooting

Detailed troubleshooting guide

GitHub Repository

Full source, examples, and changelog

​Prerequisites

​Quick Start

​Using an Existing VPC

​Key Variables

​Required

​AWS

​VPC

​EKS

​GPU

​Chamber

​Key Outputs

​GPU Pool Management

​Troubleshooting

​Cleanup

​Next Steps

Quickstart

Capacity Management

Agent Troubleshooting

GitHub Repository

Prerequisites

Quick Start

Using an Existing VPC

Key Variables

Required

AWS

VPC

EKS

GPU

Chamber

Key Outputs

GPU Pool Management

Troubleshooting

Cleanup

Next Steps