Skip to main content
The terraform-aws-chamber-eks module deploys a production-ready Amazon EKS cluster with GPU autoscaling, NVIDIA drivers, and the Chamber Agent — all in a single terraform apply.

Prerequisites

Install from developer.hashicorp.com/terraform/install. Verify with terraform version.
The AWS provider uses your local credentials. Verify with aws sts get-caller-identity.
You need a cluster token and cluster ID from the Chamber Console. See Getting a Cluster Token for instructions.

Quick Start

1

Create main.tf

Create a new directory for your Terraform configuration and add a main.tf file:
provider "aws" {
  region = var.aws_region
}

module "chamber_eks" {
  source = "github.com/ChamberOrg/terraform-aws-chamber-eks"

  cluster_name          = "my-gpu-cluster"
  aws_region            = var.aws_region
  chamber_cluster_token = var.chamber_cluster_token
  chamber_cluster_id    = var.chamber_cluster_id
}

variable "aws_region" {
  type = string
}

variable "chamber_cluster_token" {
  type      = string
  sensitive = true
}

variable "chamber_cluster_id" {
  type = string
}

output "configure_kubectl" {
  value = module.chamber_eks.configure_kubectl
}
2

Create terraform.tfvars

aws_region            = "us-west-2"
chamber_cluster_token = "your-token-here"
chamber_cluster_id    = "your-cluster-id"
Do not commit terraform.tfvars to version control. Add it to .gitignore. For CI/CD pipelines, use environment variables: TF_VAR_chamber_cluster_token.
3

Deploy

terraform init
terraform plan
terraform apply
Deployment takes approximately 15-20 minutes.
4

Configure kubectl

$(terraform output -raw configure_kubectl)
5

Verify

# Verify system nodes are ready
kubectl get nodes

# Verify Karpenter is running
kubectl get pods -n karpenter

# Verify Chamber Agent is connected
kubectl get pods -n chamber-system
Your cluster should appear in the Chamber Console under Capacity Pools.

Using an Existing VPC

To deploy into an existing VPC instead of creating a new one:
module "chamber_eks" {
  source = "github.com/ChamberOrg/terraform-aws-chamber-eks"

  cluster_name          = "my-gpu-cluster"
  aws_region            = var.aws_region
  chamber_cluster_token = var.chamber_cluster_token
  chamber_cluster_id    = var.chamber_cluster_id

  create_vpc         = false
  vpc_id             = "vpc-xxxxxxxx"
  private_subnet_ids = ["subnet-aaa", "subnet-bbb", "subnet-ccc"]
}
Private subnets must have outbound internet access (via NAT Gateway) for nodes to pull container images and connect to the Chamber control plane.

Key Variables

The table below covers the most commonly configured variables. For the complete list, see the module README on GitHub.

Required

VariableDescriptionType
cluster_nameName of the EKS clusterstring
chamber_cluster_tokenCluster token from Chamber Consolestring
chamber_cluster_idCluster ID from Chamber Consolestring

AWS

VariableDescriptionDefault
aws_regionAWS region for the EKS cluster"us-west-2"

VPC

VariableDescriptionDefault
create_vpcCreate a new VPC or use existingtrue
vpc_idExisting VPC ID (required when create_vpc = false)null
private_subnet_idsExisting private subnet IDs (required when create_vpc = false)[]
vpc_cidrCIDR block for new VPC"10.0.0.0/16"
single_nat_gatewayUse a single NAT gateway (cost savings for dev/staging)false

EKS

VariableDescriptionDefault
cluster_versionKubernetes version"1.32"
system_node_instance_typesInstance types for system node group["m5.large", "m5a.large", "m6i.large"]

GPU

VariableDescriptionDefault
create_default_gpu_nodepoolCreate Terraform-managed GPU NodePoolfalse
gpu_instance_familiesGPU instance families for NodePool["g5", "g6", "p4d", "p5"]
capacity_typesCapacity types (on-demand, spot)["on-demand", "spot"]
gpu_limitsMaximum GPUs for NodePool100

Chamber

VariableDescriptionDefault
chamber_agent_versionChamber Agent version"latest"
enable_kai_schedulerEnable KAI fractional GPU schedulertrue

Key Outputs

OutputDescription
cluster_nameName of the EKS cluster
cluster_endpointEKS API server endpoint
vpc_idVPC ID
configure_kubectlCommand to configure kubectl
verification_commandsCommands to verify the deployment
karpenter_node_role_arnKarpenter node IAM role ARN
For all outputs, see the module README on GitHub.

GPU Pool Management

After deployment, you need GPU pools for Karpenter to know which GPU nodes to provision. There are two approaches:
Manage GPU pools through the Chamber Console:
  1. Go to Capacity Pools > Create Dynamic Pool
  2. Select your cluster and configure GPU type, limits, and capacity types
  3. The pool syncs to your cluster automatically — Karpenter provisions GPU nodes on demand
This is the recommended approach for most teams. It allows per-GPU-type management with real-time limit adjustments.

Troubleshooting

Check the Chamber Agent logs:
kubectl logs -n chamber-system -l app=chamber-agent --tail=100
Verify your chamber_cluster_token and chamber_cluster_id are correct.
  1. Verify a GPU pool exists:
    kubectl get nodepool
    
    If none exists, create one via the Chamber Console or set create_default_gpu_nodepool = true.
  2. Check Karpenter logs:
    kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100
    
  3. Check for capacity errors:
    kubectl get events --field-selector reason=FailedProvisioning
    
Your AWS account lacks GPU Spot quota. Either request a quota increase via the AWS Service Quotas console, or set capacity_types = ["on-demand"].
This is expected when no GPU nodes exist yet. GPU Operator DaemonSets start automatically when Karpenter provisions GPU nodes in response to a workload.

Cleanup

terraform destroy
Ensure all GPU workloads are terminated before destroying to avoid orphaned resources.

Next Steps