How to Deploy EKS with Karpenter for Cost-Optimized Autoscaling
Quick summary: Karpenter replaces Kubernetes Cluster Autoscaler with intelligent bin-packing and just-in-time node provisioning. This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters.
Key Takeaways
- This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters
- This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters
Table of Contents
Karpenter is a Kubernetes autoscaler that replaces the legacy Cluster Autoscaler with intelligent node provisioning. Instead of over-provisioning nodes as a buffer, Karpenter watches for unschedulable pods, provisions nodes with exactly the right capacity, and consolidates idle nodes automatically. The result: clusters cost 50-70% less without sacrificing availability.
This guide covers installing Karpenter on EKS, configuring NodePools, enabling consolidation and Spot instances, and deploying production safely.
Scaling Kubernetes on AWS? FactualMinds helps teams architect cost-optimized EKS clusters with Karpenter, multi-AZ HA, and GitOps pipelines. See our AWS serverless & Kubernetes services or talk to our team.
Step 1: Set Up IAM and EKS Prerequisites
Karpenter needs IAM permissions to launch EC2 instances and IRSA (IAM Role for Service Accounts) to authenticate. You have two options: Helm (easiest) or manual IAM setup.
Using Helm Chart to Create IAM
The Karpenter Helm chart includes an optional serviceAccount.annotations parameter that auto-creates the IRSA. This is the recommended path:
# Set AWS account and region
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=us-east-1
# Add Karpenter Helm repo
helm repo add karpenter https://charts.karpenter.sh
helm repo update
# Install Karpenter with IRSA (the chart creates the IAM role and trust relationship)
helm install karpenter karpenter/karpenter \
--namespace karpenter --create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
--set settings.clusterName=${CLUSTER_NAME} \
--version v0.37.1 # Pin to stable versionIf the role doesn’t exist yet, create it manually:
# Create the IAM role with the correct trust policy
cat > /tmp/trust-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query 'cluster.identity.oidc.issuer' --output text | cut -d '/' -f 5)"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query 'cluster.identity.oidc.issuer' --output text | cut -d '/' -f 5):sub": "system:serviceaccount:karpenter:karpenter"
}
}
}
]
}
EOF
aws iam create-role \
--role-name KarpenterNodeRole-${CLUSTER_NAME} \
--assume-role-policy-document file:///tmp/trust-policy.jsonAttach Permissions
Karpenter needs permissions to launch nodes and manage ASGs:
# Attach the Karpenter policy
aws iam attach-role-policy \
--role-name KarpenterNodeRole-${CLUSTER_NAME} \
--policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
# Add the Karpenter-specific policy (in newer versions, this is auto-attached)
cat > /tmp/karpenter-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:CreateFleet",
"ec2:CreateLaunchTemplate",
"ec2:CreateNodePool",
"ec2:DeleteLaunchTemplate",
"ec2:DescribeFleets",
"ec2:DescribeInstances",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeNodePool",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"ec2:RunInstances",
"ec2:TerminateInstances"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
EOF
aws iam put-role-policy \
--role-name KarpenterNodeRole-${CLUSTER_NAME} \
--policy-name KarpenterPolicy \
--policy-document file:///tmp/karpenter-policy.jsonEnable EC2 Spot Service
aws ec2 describe-spot-price-history \
--start-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--product-descriptions "Linux/UNIX" \
--max-results 1 \
--region ${AWS_REGION}If this fails, enable Spot in your account (one-time):
aws ec2 modify-account-attribute \
--attribute-name default-vpc \
--region ${AWS_REGION}Step 2: Install Karpenter via Helm
helm install karpenter karpenter/karpenter \
--namespace karpenter \
--create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
--set settings.clusterName=${CLUSTER_NAME} \
--set settings.aws.defaultSubnets='["subnet-12345", "subnet-67890"]' \
--set settings.aws.defaultSecurityGroups='["sg-12345"]' \
--waitVerify installation:
kubectl get pods -n karpenter
# Expected: karpenter-xxx running
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -fStep 3: Create a NodePool
A NodePool defines the instance types, limits, and consolidation policy. Start with this template:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
namespace: karpenter
spec:
template:
spec:
requirements:
# Instance families: prefer cheap, general-purpose instances
- key: node.kubernetes.io/instance-family
operator: In
values: ["t3", "m6i", "c6i"]
# Instance size: allow small to large instances
- key: node.kubernetes.io/instance-size
operator: In
values: ["small", "medium", "large", "xlarge"]
# Capacity type: 70% Spot, 30% on-demand (Spot has interruption risk)
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# Architecture: prefer x86_64
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
limits:
# Hard limit on total CPU across all nodes in this NodePool
cpu: "100"
# Hard limit on total memory
memory: "100Gi"
disruption:
# Consolidation: remove idle nodes automatically
consolidateAfter: 30s
# Expire nodes: refresh nodes every 604800 seconds (7 days) for patching
expireAfter: 604800s
# Budget: allow up to 25% of nodes to consolidate per minute
budgets:
- nodes: "25%"
duration: 1m
providerRef:
name: default
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
namespace: karpenter
spec:
amiFamily: AL2
subnetSelector:
karpenter.sh/discovery: "true" # Select subnets tagged with this
securityGroupSelector:
karpenter.sh/discovery: "true" # Select security groups tagged with this
tags:
ManagedBy: "karpenter"
Environment: "production"
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
deleteOnTermination: trueApply this:
kubectl apply -f nodepool.yamlTag Your Subnets and Security Groups
Karpenter needs to find subnets and security groups via tags:
# Tag subnets
aws ec2 create-tags \
--resources subnet-12345 subnet-67890 \
--tags Key=karpenter.sh/discovery,Value=true
# Tag security groups
aws ec2 create-tags \
--resources sg-12345 \
--tags Key=karpenter.sh/discovery,Value=trueStep 4: Test Karpenter with a Deployment
Deploy a test workload to see Karpenter provision nodes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: karpenter-test
spec:
replicas: 3
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- name: app
image: nginx:latest
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"Apply and watch:
kubectl apply -f test-deployment.yaml
kubectl get nodes -L karpenter.sh/capacity-type
# You should see new nodes appear (provisioned by Karpenter)Check logs to confirm Karpenter provisioned nodes:
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50
# Look for: "Creating instance", "Provisioned node"Step 5: Optimize Cost with Consolidation
Consolidation is Karpenter’s biggest cost-saving feature. It automatically removes idle nodes.
How Consolidation Works
- Karpenter watches for nodes with low utilization
- It checks if pods on those nodes can fit on other nodes
- If yes, it cordons the old node, drains pods, and terminates it
Configuration
The NodePool above already includes consolidation:
disruption:
consolidateAfter: 30s # Wait 30s before attempting consolidation
budgets:
- nodes: "25%" # Consolidate max 25% of nodes per minute
duration: 1mTuning for cost:
consolidateAfter: 30s— aggressive (good for non-stateful workloads)consolidateAfter: 5m— conservative (good for databases, stateful services)
Tuning for stability:
nodes: "10%"— slow (safer for production, less churn)nodes: "50%"— fast (for dev/staging)
Monitor Consolidation
# Watch for consolidation events
kubectl get events -n karpenter --sort-by='.lastTimestamp'
# Look for: "Consolidating nodes...", "Terminating nodes..."
# Check current node utilization
kubectl top nodesStep 6: Optimize Cost Further
1. Use Spot Instances
The NodePool above already includes Spot (70% preference). Spot costs 70-90% less than on-demand:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]To use only Spot (risky):
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]To use only on-demand (expensive but stable):
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]2. Right-Size Instance Types
Default NodePool uses t3, m6i, c6i — good for general workloads. For specific needs:
High-CPU workloads (CI/CD, data processing):
- key: node.kubernetes.io/instance-family
operator: In
values: ["c6i", "c5"]High-Memory workloads (databases, caching):
- key: node.kubernetes.io/instance-family
operator: In
values: ["r6i", "r5"]Burstable, cost-sensitive workloads:
- key: node.kubernetes.io/instance-family
operator: In
values: ["t4g", "t3"]3. Set Cluster-Wide CPU and Memory Limits
The NodePool limits prevent runaway provisioning:
limits:
cpu: "100" # Stop provisioning when total CPU hits 100
memory: "100Gi" # Stop provisioning when total memory hits 100GiIf you hit these limits, pods remain unschedulable. Monitor in production:
kubectl describe nodepool default -n karpenter | grep -A2 "Limits"4. Add Pod Priority
Karpenter respects pod priority — critical pods are consolidated last:
apiVersion: v1
kind: PriorityClass
metadata:
name: critical
value: 1000
globalDefault: false
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-critical-app
spec:
template:
spec:
priorityClassName: critical
containers:
- name: app
image: ...Critical pods stay on nodes during consolidation; non-critical pods get moved first.
Step 7: Production Safety Patterns
Pattern 1: Pod Disruption Budgets (PDB)
Prevent Karpenter from disrupting too many pods at once:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-appThis ensures 2 pods of my-app stay available during consolidation.
Pattern 2: Multi-NodePool Strategy
Use separate NodePools for different workload types:
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: general
namespace: karpenter
spec:
template:
spec:
requirements:
- key: node.kubernetes.io/instance-family
operator: In
values: ["t3", "m6i"]
nodeClassRef:
name: general
limits:
cpu: "50"
memory: "50Gi"
consolidation:
consolidateAfter: 30s
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: gpu
namespace: karpenter
spec:
template:
spec:
requirements:
- key: node.kubernetes.io/instance-family
operator: In
values: ["g4dn"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"] # GPU Spot is unstable
nodeClassRef:
name: gpu
limits:
cpu: "100"
memory: "200Gi"
consolidation:
consolidateAfter: 5m # Conservative for stateful GPU jobsThen schedule workloads to the right pool:
spec:
nodeSelector:
karpenter.sh/nodepool: general # or gpuPattern 3: Monitoring
Set up CloudWatch alarms:
# Alert if consolidation isn't running (potential cost waste)
aws cloudwatch put-metric-alarm \
--alarm-name karpenter-consolidation-stalled \
--alarm-description "Karpenter consolidation hasn't occurred in 1 hour" \
--metric-name karpenter_consolidation_actions_performed_total \
--namespace karpenter \
--statistic Sum \
--period 3600 \
--threshold 0 \
--comparison-operator LessThanThreshold \
--evaluation-periods 1Watch node churn (too much consolidation = instability):
# Check termination rate
kubectl get events -n karpenter | grep -c "Terminating node"Common Mistakes to Avoid
Over-aggressive consolidation
- Setting
consolidateAfter: 1scauses constant node churn - Use 30s-5m based on workload type
- Setting
Forgetting Pod Disruption Budgets
- Without PDBs, critical apps might lose quorum during consolidation
- Add PDBs for stateful services (databases, message queues)
Not tagging subnets
- Karpenter won’t find subnets without
karpenter.sh/discovery: "true"tags - Verify tags before troubleshooting provisioning issues
- Karpenter won’t find subnets without
Setting CPU/memory limits too low
- If limits are hit, new pods stay unschedulable
- Monitor
kubectl describe nodepooland increase limits as workload grows
Using only Spot instances
- Spot has >30% hourly interruption rate
- Use 70/30 Spot/on-demand split; pin critical services to on-demand
Cost Comparison: Cluster Autoscaler vs Karpenter
For a 50-pod cluster running 2-4 weeks per month (typical batch workload):
Cluster Autoscaler:
- Over-provisions 20% buffer: 50 pods × 1.2 = 60 pod slots provisioned
- Average 20 nodes of m5.large running 24/7 = $0.192/hr × 20 = $3.84/hr
- Monthly: $3.84 × 730 hours = $2,803
Karpenter:
- Exactly fits pods: 50 pod slots
- Average 17 nodes (same pods, better bin-packing) = $3.27/hr
- Monthly: $3.27 × 730 hours = $2,387
- Savings: ~$416/month (15%)
Add consolidation (removes idle nodes in off-hours):
- Off-peak hours (10pm-8am): cluster scales to 5 nodes
- Peak hours: 17 nodes
- Average: ~11 nodes = $2.11/hr
- Monthly: $2.11 × 730 = $1,540
- Savings with consolidation: ~$1,263/month (45%)
Next Steps
- Install Karpenter on a dev/staging cluster (2-3 hours)
- Test with sample workloads for 1-2 weeks
- Monitor consolidation and cost savings
- Deploy to production with PDBs and multi-NodePool strategy
- Set up CloudWatch alarms for churn and utilization
- Talk to FactualMinds if you need help tuning consolidation, capacity planning, or multi-cluster Karpenter deployments
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.



