AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Karpenter replaces Kubernetes Cluster Autoscaler with intelligent bin-packing and just-in-time node provisioning. This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters.

Key Facts

  • This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters
  • This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters

Entity Definitions

EKS
EKS is an AWS service discussed in this article.
cost optimization
cost optimization is a cloud computing concept discussed in this article.
Kubernetes
Kubernetes is a development tool discussed in this article.

How to Deploy EKS with Karpenter for Cost-Optimized Autoscaling

Cloud Architecture Palaniappan P 8 min read

Quick summary: Karpenter replaces Kubernetes Cluster Autoscaler with intelligent bin-packing and just-in-time node provisioning. This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters.

Key Takeaways

  • This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters
  • This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters
Table of Contents

Karpenter is a Kubernetes autoscaler that replaces the legacy Cluster Autoscaler with intelligent node provisioning. Instead of over-provisioning nodes as a buffer, Karpenter watches for unschedulable pods, provisions nodes with exactly the right capacity, and consolidates idle nodes automatically. The result: clusters cost 50-70% less without sacrificing availability.

This guide covers installing Karpenter on EKS, configuring NodePools, enabling consolidation and Spot instances, and deploying production safely.

Scaling Kubernetes on AWS? FactualMinds helps teams architect cost-optimized EKS clusters with Karpenter, multi-AZ HA, and GitOps pipelines. See our AWS serverless & Kubernetes services or talk to our team.

Step 1: Set Up IAM and EKS Prerequisites

Karpenter needs IAM permissions to launch EC2 instances and IRSA (IAM Role for Service Accounts) to authenticate. You have two options: Helm (easiest) or manual IAM setup.

Using Helm Chart to Create IAM

The Karpenter Helm chart includes an optional serviceAccount.annotations parameter that auto-creates the IRSA. This is the recommended path:

# Set AWS account and region
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=us-east-1

# Add Karpenter Helm repo
helm repo add karpenter https://charts.karpenter.sh
helm repo update

# Install Karpenter with IRSA (the chart creates the IAM role and trust relationship)
helm install karpenter karpenter/karpenter \
  --namespace karpenter --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
  --set settings.clusterName=${CLUSTER_NAME} \
  --version v0.37.1  # Pin to stable version

If the role doesn’t exist yet, create it manually:

# Create the IAM role with the correct trust policy
cat > /tmp/trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query 'cluster.identity.oidc.issuer' --output text | cut -d '/' -f 5)"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query 'cluster.identity.oidc.issuer' --output text | cut -d '/' -f 5):sub": "system:serviceaccount:karpenter:karpenter"
        }
      }
    }
  ]
}
EOF

aws iam create-role \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --assume-role-policy-document file:///tmp/trust-policy.json

Attach Permissions

Karpenter needs permissions to launch nodes and manage ASGs:

# Attach the Karpenter policy
aws iam attach-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

# Add the Karpenter-specific policy (in newer versions, this is auto-attached)
cat > /tmp/karpenter-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "ec2:CreateFleet",
        "ec2:CreateLaunchTemplate",
        "ec2:CreateNodePool",
        "ec2:DeleteLaunchTemplate",
        "ec2:DescribeFleets",
        "ec2:DescribeInstances",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeNodePool",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeVpcs",
        "ec2:RunInstances",
        "ec2:TerminateInstances"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}
EOF

aws iam put-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-name KarpenterPolicy \
  --policy-document file:///tmp/karpenter-policy.json

Enable EC2 Spot Service

aws ec2 describe-spot-price-history \
  --start-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --product-descriptions "Linux/UNIX" \
  --max-results 1 \
  --region ${AWS_REGION}

If this fails, enable Spot in your account (one-time):

aws ec2 modify-account-attribute \
  --attribute-name default-vpc \
  --region ${AWS_REGION}

Step 2: Install Karpenter via Helm

helm install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
  --set settings.clusterName=${CLUSTER_NAME} \
  --set settings.aws.defaultSubnets='["subnet-12345", "subnet-67890"]' \
  --set settings.aws.defaultSecurityGroups='["sg-12345"]' \
  --wait

Verify installation:

kubectl get pods -n karpenter
# Expected: karpenter-xxx running
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

Step 3: Create a NodePool

A NodePool defines the instance types, limits, and consolidation policy. Start with this template:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        # Instance families: prefer cheap, general-purpose instances
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ["t3", "m6i", "c6i"]
        # Instance size: allow small to large instances
        - key: node.kubernetes.io/instance-size
          operator: In
          values: ["small", "medium", "large", "xlarge"]
        # Capacity type: 70% Spot, 30% on-demand (Spot has interruption risk)
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        # Architecture: prefer x86_64
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
  limits:
    # Hard limit on total CPU across all nodes in this NodePool
    cpu: "100"
    # Hard limit on total memory
    memory: "100Gi"
  disruption:
    # Consolidation: remove idle nodes automatically
    consolidateAfter: 30s
    # Expire nodes: refresh nodes every 604800 seconds (7 days) for patching
    expireAfter: 604800s
    # Budget: allow up to 25% of nodes to consolidate per minute
    budgets:
      - nodes: "25%"
        duration: 1m
  providerRef:
    name: default
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
  namespace: karpenter
spec:
  amiFamily: AL2
  subnetSelector:
    karpenter.sh/discovery: "true"  # Select subnets tagged with this
  securityGroupSelector:
    karpenter.sh/discovery: "true"  # Select security groups tagged with this
  tags:
    ManagedBy: "karpenter"
    Environment: "production"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        deleteOnTermination: true

Apply this:

kubectl apply -f nodepool.yaml

Tag Your Subnets and Security Groups

Karpenter needs to find subnets and security groups via tags:

# Tag subnets
aws ec2 create-tags \
  --resources subnet-12345 subnet-67890 \
  --tags Key=karpenter.sh/discovery,Value=true

# Tag security groups
aws ec2 create-tags \
  --resources sg-12345 \
  --tags Key=karpenter.sh/discovery,Value=true

Step 4: Test Karpenter with a Deployment

Deploy a test workload to see Karpenter provision nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: karpenter-test
spec:
  replicas: 3
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: app
        image: nginx:latest
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "500m"
            memory: "1Gi"

Apply and watch:

kubectl apply -f test-deployment.yaml
kubectl get nodes -L karpenter.sh/capacity-type
# You should see new nodes appear (provisioned by Karpenter)

Check logs to confirm Karpenter provisioned nodes:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50
# Look for: "Creating instance", "Provisioned node"

Step 5: Optimize Cost with Consolidation

Consolidation is Karpenter’s biggest cost-saving feature. It automatically removes idle nodes.

How Consolidation Works

  1. Karpenter watches for nodes with low utilization
  2. It checks if pods on those nodes can fit on other nodes
  3. If yes, it cordons the old node, drains pods, and terminates it

Configuration

The NodePool above already includes consolidation:

disruption:
  consolidateAfter: 30s  # Wait 30s before attempting consolidation
  budgets:
    - nodes: "25%"       # Consolidate max 25% of nodes per minute
      duration: 1m

Tuning for cost:

  • consolidateAfter: 30s — aggressive (good for non-stateful workloads)
  • consolidateAfter: 5m — conservative (good for databases, stateful services)

Tuning for stability:

  • nodes: "10%" — slow (safer for production, less churn)
  • nodes: "50%" — fast (for dev/staging)

Monitor Consolidation

# Watch for consolidation events
kubectl get events -n karpenter --sort-by='.lastTimestamp'
# Look for: "Consolidating nodes...", "Terminating nodes..."

# Check current node utilization
kubectl top nodes

Step 6: Optimize Cost Further

1. Use Spot Instances

The NodePool above already includes Spot (70% preference). Spot costs 70-90% less than on-demand:

requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot", "on-demand"]

To use only Spot (risky):

requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot"]

To use only on-demand (expensive but stable):

requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["on-demand"]

2. Right-Size Instance Types

Default NodePool uses t3, m6i, c6i — good for general workloads. For specific needs:

High-CPU workloads (CI/CD, data processing):

- key: node.kubernetes.io/instance-family
  operator: In
  values: ["c6i", "c5"]

High-Memory workloads (databases, caching):

- key: node.kubernetes.io/instance-family
  operator: In
  values: ["r6i", "r5"]

Burstable, cost-sensitive workloads:

- key: node.kubernetes.io/instance-family
  operator: In
  values: ["t4g", "t3"]

3. Set Cluster-Wide CPU and Memory Limits

The NodePool limits prevent runaway provisioning:

limits:
  cpu: "100"        # Stop provisioning when total CPU hits 100
  memory: "100Gi"   # Stop provisioning when total memory hits 100Gi

If you hit these limits, pods remain unschedulable. Monitor in production:

kubectl describe nodepool default -n karpenter | grep -A2 "Limits"

4. Add Pod Priority

Karpenter respects pod priority — critical pods are consolidated last:

apiVersion: v1
kind: PriorityClass
metadata:
  name: critical
value: 1000
globalDefault: false
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-critical-app
spec:
  template:
    spec:
      priorityClassName: critical
      containers:
      - name: app
        image: ...

Critical pods stay on nodes during consolidation; non-critical pods get moved first.

Step 7: Production Safety Patterns

Pattern 1: Pod Disruption Budgets (PDB)

Prevent Karpenter from disrupting too many pods at once:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

This ensures 2 pods of my-app stay available during consolidation.

Pattern 2: Multi-NodePool Strategy

Use separate NodePools for different workload types:

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: general
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ["t3", "m6i"]
      nodeClassRef:
        name: general
  limits:
    cpu: "50"
    memory: "50Gi"
  consolidation:
    consolidateAfter: 30s
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gpu
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ["g4dn"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]  # GPU Spot is unstable
      nodeClassRef:
        name: gpu
  limits:
    cpu: "100"
    memory: "200Gi"
  consolidation:
    consolidateAfter: 5m  # Conservative for stateful GPU jobs

Then schedule workloads to the right pool:

spec:
  nodeSelector:
    karpenter.sh/nodepool: general  # or gpu

Pattern 3: Monitoring

Set up CloudWatch alarms:

# Alert if consolidation isn't running (potential cost waste)
aws cloudwatch put-metric-alarm \
  --alarm-name karpenter-consolidation-stalled \
  --alarm-description "Karpenter consolidation hasn't occurred in 1 hour" \
  --metric-name karpenter_consolidation_actions_performed_total \
  --namespace karpenter \
  --statistic Sum \
  --period 3600 \
  --threshold 0 \
  --comparison-operator LessThanThreshold \
  --evaluation-periods 1

Watch node churn (too much consolidation = instability):

# Check termination rate
kubectl get events -n karpenter | grep -c "Terminating node"

Common Mistakes to Avoid

  1. Over-aggressive consolidation

    • Setting consolidateAfter: 1s causes constant node churn
    • Use 30s-5m based on workload type
  2. Forgetting Pod Disruption Budgets

    • Without PDBs, critical apps might lose quorum during consolidation
    • Add PDBs for stateful services (databases, message queues)
  3. Not tagging subnets

    • Karpenter won’t find subnets without karpenter.sh/discovery: "true" tags
    • Verify tags before troubleshooting provisioning issues
  4. Setting CPU/memory limits too low

    • If limits are hit, new pods stay unschedulable
    • Monitor kubectl describe nodepool and increase limits as workload grows
  5. Using only Spot instances

    • Spot has >30% hourly interruption rate
    • Use 70/30 Spot/on-demand split; pin critical services to on-demand

Cost Comparison: Cluster Autoscaler vs Karpenter

For a 50-pod cluster running 2-4 weeks per month (typical batch workload):

Cluster Autoscaler:

  • Over-provisions 20% buffer: 50 pods × 1.2 = 60 pod slots provisioned
  • Average 20 nodes of m5.large running 24/7 = $0.192/hr × 20 = $3.84/hr
  • Monthly: $3.84 × 730 hours = $2,803

Karpenter:

  • Exactly fits pods: 50 pod slots
  • Average 17 nodes (same pods, better bin-packing) = $3.27/hr
  • Monthly: $3.27 × 730 hours = $2,387
  • Savings: ~$416/month (15%)

Add consolidation (removes idle nodes in off-hours):

  • Off-peak hours (10pm-8am): cluster scales to 5 nodes
  • Peak hours: 17 nodes
  • Average: ~11 nodes = $2.11/hr
  • Monthly: $2.11 × 730 = $1,540
  • Savings with consolidation: ~$1,263/month (45%)

Next Steps

  1. Install Karpenter on a dev/staging cluster (2-3 hours)
  2. Test with sample workloads for 1-2 weeks
  3. Monitor consolidation and cost savings
  4. Deploy to production with PDBs and multi-NodePool strategy
  5. Set up CloudWatch alarms for churn and utilization
  6. Talk to FactualMinds if you need help tuning consolidation, capacity planning, or multi-cluster Karpenter deployments
PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »
How to Migrate to AWS Without Cost Surprises

How to Migrate to AWS Without Cost Surprises

AWS migration cost estimates are consistently wrong — not because the tools are bad, but because they miss the parallel run period, data transfer during migration, and the operational tax of learning a new environment. Here is what to actually model.