terraform-google-chamber-gke module deploys a production-ready Google GKE cluster with GPU autoscaling, NVIDIA drivers, and the Chamber Agent — all in a single terraform apply.
Prerequisites
Terraform >= 1.3.0
Terraform >= 1.3.0
Install from developer.hashicorp.com/terraform/install. Verify with
terraform version.gcloud CLI authenticated
gcloud CLI authenticated
Install the gcloud CLI and authenticate:
Required GCP APIs enabled
Required GCP APIs enabled
Enable the required APIs in your project:
Chamber Console account
Chamber Console account
You need a cluster token and cluster ID from the Chamber Console. See Getting a Cluster Token for instructions.
Quick Start
The GKE module requires explicit provider configuration because the
kubernetes, helm, and kubectl providers need the cluster endpoint and credentials from the module outputs. The full configuration is shown below.Verify
Using an Existing VPC
To deploy into an existing VPC instead of creating a new one:Key Variables
The table below covers the most commonly configured variables. For the complete list, see the module README on GitHub.Required
| Variable | Description | Type |
|---|---|---|
gcp_project_id | GCP project ID | string |
cluster_name | Name of the GKE cluster | string |
chamber_cluster_token | Cluster token from Chamber Console | string |
chamber_cluster_id | Cluster ID from Chamber Console | string |
GCP
| Variable | Description | Default |
|---|---|---|
gcp_region | GCP region for the GKE cluster | "us-central1" |
gcp_zones | Zones within the region (defaults to first 3) | [] |
VPC
| Variable | Description | Default |
|---|---|---|
create_vpc | Create a new VPC or use existing | true |
network_name | Existing VPC name (required when create_vpc = false) | null |
subnetwork_name | Existing subnet name (required when create_vpc = false) | null |
vpc_cidr | CIDR block for new VPC primary subnet | "10.0.0.0/16" |
GKE
| Variable | Description | Default |
|---|---|---|
cluster_version | Kubernetes version | "1.32" |
system_node_machine_type | Machine type for system node pool | "e2-standard-4" |
enable_private_endpoint | Private endpoint only (no public API access) | false |
GPU
| Variable | Description | Default |
|---|---|---|
create_default_gpu_nodepool | Create Terraform-managed GPU NodePool | false |
gpu_machine_families | GPU machine families for NodePool | ["a2", "g2", "a3"] |
gpu_accelerator_types | GPU accelerator types | ["nvidia-l4", "nvidia-a100-80gb", "nvidia-h100-80gb"] |
capacity_types | Capacity types (on-demand, spot) | ["on-demand", "spot"] |
gpu_limits | Maximum GPUs for NodePool | 100 |
Chamber
| Variable | Description | Default |
|---|---|---|
chamber_agent_version | Chamber Agent version | "latest" |
enable_kai_scheduler | Enable KAI fractional GPU scheduler | true |
Key Outputs
| Output | Description |
|---|---|
cluster_name | GKE cluster name |
cluster_endpoint | GKE API server endpoint |
network_name | VPC network name |
configure_kubectl | gcloud command to configure kubectl |
verification_commands | Commands to verify the deployment |
karpenter_service_account_email | Karpenter controller service account email |
GPU Pool Management
After deployment, you need GPU pools for Karpenter to know which GPU nodes to provision. There are two approaches:- Console-managed (default)
- Terraform-managed
Manage GPU pools through the Chamber Console:
- Go to Capacity Pools > Create Dynamic Pool
- Select your cluster and configure GPU type, limits, and capacity types
- The pool syncs to your cluster automatically — Karpenter provisions GPU nodes on demand
Troubleshooting
Chamber Agent not connecting
Chamber Agent not connecting
Check the Chamber Agent logs:Verify your
chamber_cluster_token and chamber_cluster_id are correct.GPU nodes not provisioning
GPU nodes not provisioning
-
Verify a GPU pool exists:
If none exists, create one via the Chamber Console or set
create_default_gpu_nodepool = true. -
Check Karpenter logs:
-
Verify GCENodeClass:
GPU pods stuck in Pending
GPU pods stuck in Pending
Check the KAI Scheduler logs and pod events:

