Skip to main content
The terraform-google-chamber-gke module deploys a production-ready Google GKE cluster with GPU autoscaling, NVIDIA drivers, and the Chamber Agent — all in a single terraform apply.

Prerequisites

Install from developer.hashicorp.com/terraform/install. Verify with terraform version.
Install the gcloud CLI and authenticate:
gcloud auth login
gcloud auth application-default login
Enable the required APIs in your project:
gcloud services enable \
  container.googleapis.com \
  compute.googleapis.com \
  iam.googleapis.com \
  --project=YOUR_PROJECT_ID
You need a cluster token and cluster ID from the Chamber Console. See Getting a Cluster Token for instructions.

Quick Start

The GKE module requires explicit provider configuration because the kubernetes, helm, and kubectl providers need the cluster endpoint and credentials from the module outputs. The full configuration is shown below.
1

Create main.tf

Create a new directory for your Terraform configuration and add a main.tf file:
provider "google"     { project = var.gcp_project_id; region = var.gcp_region }
provider "google-beta" { project = var.gcp_project_id; region = var.gcp_region }

data "google_client_config" "default" {}

provider "kubernetes" {
  host                   = "https://${module.chamber_gke.cluster_endpoint}"
  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(module.chamber_gke.cluster_ca_certificate)
}

provider "helm" {
  kubernetes {
    host                   = "https://${module.chamber_gke.cluster_endpoint}"
    token                  = data.google_client_config.default.access_token
    cluster_ca_certificate = base64decode(module.chamber_gke.cluster_ca_certificate)
  }
}

provider "kubectl" {
  host                   = "https://${module.chamber_gke.cluster_endpoint}"
  token                  = data.google_client_config.default.access_token
  cluster_ca_certificate = base64decode(module.chamber_gke.cluster_ca_certificate)
  load_config_file       = false
}

module "chamber_gke" {
  source = "github.com/ChamberOrg/terraform-google-chamber-gke"

  gcp_project_id        = var.gcp_project_id
  gcp_region            = var.gcp_region
  cluster_name          = "my-gpu-cluster"
  chamber_cluster_token = var.chamber_cluster_token
  chamber_cluster_id    = var.chamber_cluster_id
}

variable "gcp_project_id" {
  type = string
}

variable "gcp_region" {
  type = string
}

variable "chamber_cluster_token" {
  type      = string
  sensitive = true
}

variable "chamber_cluster_id" {
  type = string
}

output "configure_kubectl" {
  value = module.chamber_gke.configure_kubectl
}
2

Create terraform.tfvars

gcp_project_id        = "my-gcp-project"
gcp_region            = "us-central1"
chamber_cluster_token = "your-token-here"
chamber_cluster_id    = "your-cluster-id"
Do not commit terraform.tfvars to version control. Add it to .gitignore. For CI/CD pipelines, use environment variables: TF_VAR_chamber_cluster_token.
3

Deploy

terraform init
terraform plan
terraform apply
Deployment takes approximately 15-20 minutes.
4

Configure kubectl

$(terraform output -raw configure_kubectl)
5

Verify

# Verify system nodes are ready
kubectl get nodes -l purpose=system

# Verify Karpenter is running
kubectl get pods -n karpenter

# Verify Chamber Agent is connected
kubectl get pods -n chamber-system
Your cluster should appear in the Chamber Console under Capacity Pools.

Using an Existing VPC

To deploy into an existing VPC instead of creating a new one:
module "chamber_gke" {
  source = "github.com/ChamberOrg/terraform-google-chamber-gke"

  gcp_project_id        = var.gcp_project_id
  gcp_region            = var.gcp_region
  cluster_name          = "my-gpu-cluster"
  chamber_cluster_token = var.chamber_cluster_token
  chamber_cluster_id    = var.chamber_cluster_id

  create_vpc        = false
  network_name      = "my-existing-vpc"
  subnetwork_name   = "my-existing-subnet"
  ip_range_pods     = "pods"
  ip_range_services = "services"
}
The existing subnet must have secondary IP ranges named for pods and services. Cloud NAT must be configured for private node egress.

Key Variables

The table below covers the most commonly configured variables. For the complete list, see the module README on GitHub.

Required

VariableDescriptionType
gcp_project_idGCP project IDstring
cluster_nameName of the GKE clusterstring
chamber_cluster_tokenCluster token from Chamber Consolestring
chamber_cluster_idCluster ID from Chamber Consolestring

GCP

VariableDescriptionDefault
gcp_regionGCP region for the GKE cluster"us-central1"
gcp_zonesZones within the region (defaults to first 3)[]

VPC

VariableDescriptionDefault
create_vpcCreate a new VPC or use existingtrue
network_nameExisting VPC name (required when create_vpc = false)null
subnetwork_nameExisting subnet name (required when create_vpc = false)null
vpc_cidrCIDR block for new VPC primary subnet"10.0.0.0/16"

GKE

VariableDescriptionDefault
cluster_versionKubernetes version"1.32"
system_node_machine_typeMachine type for system node pool"e2-standard-4"
enable_private_endpointPrivate endpoint only (no public API access)false

GPU

VariableDescriptionDefault
create_default_gpu_nodepoolCreate Terraform-managed GPU NodePoolfalse
gpu_machine_familiesGPU machine families for NodePool["a2", "g2", "a3"]
gpu_accelerator_typesGPU accelerator types["nvidia-l4", "nvidia-a100-80gb", "nvidia-h100-80gb"]
capacity_typesCapacity types (on-demand, spot)["on-demand", "spot"]
gpu_limitsMaximum GPUs for NodePool100

Chamber

VariableDescriptionDefault
chamber_agent_versionChamber Agent version"latest"
enable_kai_schedulerEnable KAI fractional GPU schedulertrue

Key Outputs

OutputDescription
cluster_nameGKE cluster name
cluster_endpointGKE API server endpoint
network_nameVPC network name
configure_kubectlgcloud command to configure kubectl
verification_commandsCommands to verify the deployment
karpenter_service_account_emailKarpenter controller service account email
For all outputs, see the module README on GitHub.

GPU Pool Management

After deployment, you need GPU pools for Karpenter to know which GPU nodes to provision. There are two approaches:
Manage GPU pools through the Chamber Console:
  1. Go to Capacity Pools > Create Dynamic Pool
  2. Select your cluster and configure GPU type, limits, and capacity types
  3. The pool syncs to your cluster automatically — Karpenter provisions GPU nodes on demand
This is the recommended approach for most teams. It allows per-GPU-type management with real-time limit adjustments.

Troubleshooting

Check the Chamber Agent logs:
kubectl logs -n chamber-system -l app.kubernetes.io/name=chamber-agent --tail=50
Verify your chamber_cluster_token and chamber_cluster_id are correct.
  1. Verify a GPU pool exists:
    kubectl get nodepools.karpenter.sh
    
    If none exists, create one via the Chamber Console or set create_default_gpu_nodepool = true.
  2. Check Karpenter logs:
    kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100
    
  3. Verify GCENodeClass:
    kubectl get gcenodeclasses
    
Check the KAI Scheduler logs and pod events:
kubectl logs -n chamber-system -l app=kai-scheduler --tail=50
kubectl describe pod <pod-name>

Cleanup

terraform destroy
Ensure all GPU workloads are terminated before destroying to avoid orphaned resources.

Next Steps