terraform-aws-chamber-eks module deploys a production-ready Amazon EKS cluster with GPU autoscaling, NVIDIA drivers, and the Chamber Agent — all in a single terraform apply.
Prerequisites
Terraform >= 1.3.0
Terraform >= 1.3.0
Install from developer.hashicorp.com/terraform/install. Verify with
terraform version.AWS CLI configured
AWS CLI configured
The AWS provider uses your local credentials. Verify with
aws sts get-caller-identity.Chamber Console account
Chamber Console account
You need a cluster token and cluster ID from the Chamber Console. See Getting a Cluster Token for instructions.
Quick Start
Verify
Using an Existing VPC
To deploy into an existing VPC instead of creating a new one:Key Variables
The table below covers the most commonly configured variables. For the complete list, see the module README on GitHub.Required
| Variable | Description | Type |
|---|---|---|
cluster_name | Name of the EKS cluster | string |
chamber_cluster_token | Cluster token from Chamber Console | string |
chamber_cluster_id | Cluster ID from Chamber Console | string |
AWS
| Variable | Description | Default |
|---|---|---|
aws_region | AWS region for the EKS cluster | "us-west-2" |
VPC
| Variable | Description | Default |
|---|---|---|
create_vpc | Create a new VPC or use existing | true |
vpc_id | Existing VPC ID (required when create_vpc = false) | null |
private_subnet_ids | Existing private subnet IDs (required when create_vpc = false) | [] |
vpc_cidr | CIDR block for new VPC | "10.0.0.0/16" |
single_nat_gateway | Use a single NAT gateway (cost savings for dev/staging) | false |
EKS
| Variable | Description | Default |
|---|---|---|
cluster_version | Kubernetes version | "1.32" |
system_node_instance_types | Instance types for system node group | ["m5.large", "m5a.large", "m6i.large"] |
GPU
| Variable | Description | Default |
|---|---|---|
create_default_gpu_nodepool | Create Terraform-managed GPU NodePool | false |
gpu_instance_families | GPU instance families for NodePool | ["g5", "g6", "p4d", "p5"] |
capacity_types | Capacity types (on-demand, spot) | ["on-demand", "spot"] |
gpu_limits | Maximum GPUs for NodePool | 100 |
Chamber
| Variable | Description | Default |
|---|---|---|
chamber_agent_version | Chamber Agent version | "latest" |
enable_kai_scheduler | Enable KAI fractional GPU scheduler | true |
Key Outputs
| Output | Description |
|---|---|
cluster_name | Name of the EKS cluster |
cluster_endpoint | EKS API server endpoint |
vpc_id | VPC ID |
configure_kubectl | Command to configure kubectl |
verification_commands | Commands to verify the deployment |
karpenter_node_role_arn | Karpenter node IAM role ARN |
GPU Pool Management
After deployment, you need GPU pools for Karpenter to know which GPU nodes to provision. There are two approaches:- Console-managed (default)
- Terraform-managed
Manage GPU pools through the Chamber Console:
- Go to Capacity Pools > Create Dynamic Pool
- Select your cluster and configure GPU type, limits, and capacity types
- The pool syncs to your cluster automatically — Karpenter provisions GPU nodes on demand
Troubleshooting
Cluster not appearing in Chamber Console
Cluster not appearing in Chamber Console
Check the Chamber Agent logs:Verify your
chamber_cluster_token and chamber_cluster_id are correct.GPU nodes not provisioning
GPU nodes not provisioning
-
Verify a GPU pool exists:
If none exists, create one via the Chamber Console or set
create_default_gpu_nodepool = true. -
Check Karpenter logs:
-
Check for capacity errors:
MaxSpotInstanceCountExceeded
MaxSpotInstanceCountExceeded
Your AWS account lacks GPU Spot quota. Either request a quota increase via the AWS Service Quotas console, or set
capacity_types = ["on-demand"].GPU Operator pods stuck in Pending
GPU Operator pods stuck in Pending
This is expected when no GPU nodes exist yet. GPU Operator DaemonSets start automatically when Karpenter provisions GPU nodes in response to a workload.

