What is the difference between Cluster Autoscaler and Karpenter?

Cluster Autoscaler (CA) watches for unschedulable pods, then scales up nodes reactively. Karpenter watches for unschedulable pods, then *provisions* nodes with exactly the right capacity, eliminating wasted over-provisioning. Example: a pod needs 1 vCPU + 2GB RAM. CA might launch an m5.xlarge (4vCPU, 16GB, $0.192/hr). Karpenter launches the smallest instance that fits (e.g., t3.small, 2vCPU, 2GB, $0.016/hr) — 10x cheaper. Plus, Karpenter consolidates idle nodes automatically, CA requires manual configuration. Result: Karpenter reduces idle capacity, bin-packs efficiently, and removes expensive over-provisioning.

Does Karpenter work with spot instances?

Yes. Karpenter provisions both on-demand and Spot instances, automatically falling back to on-demand if Spot is unavailable. You define a "Spot + on-demand" consolidation policy, and Karpenter picks the cheapest option. For example: a NodePool might allow t3, m6i, c6i families at 70% Spot preference — Karpenter tries Spot first, falls back to on-demand if interrupted. Spot can reduce costs by 70-90% vs on-demand, but carry interruption risk. Use for fault-tolerant workloads (batch, stateless services); pin on-demand for stateful/critical services.

What is consolidation and why does it matter for cost?

Consolidation removes nodes that aren't needed. After scaling down (e.g., a job completes and pods terminate), Karpenter checks if remaining pods fit on fewer nodes. If they do, it cordons old nodes, drains pods to remaining nodes, and terminates the empty ones. This happens automatically in the background. Example: after a batch job finishes, you had 10 nodes running 20 pods. After consolidation, those 20 pods fit on 5 nodes. Karpenter terminates the other 5, saving ~$1/hour. Without consolidation (like CA), those idle nodes stay running indefinitely. Karpenter's consolidation is the biggest cost savings lever — it keeps your cluster rightsized continuously.

Does Karpenter support GPU workloads?

Yes. Karpenter can provision GPU nodes (p3, g4, g5 instances) and handles GPU scheduling. You define a NodePool with GPU requirements: Karpenter will provision g4dn.xlarge (1x T4 GPU) or larger when a pod requests GPU. Cost: GPUs are expensive (~$0.35-$1.06/hour for a T4 GPU alone). Karpenter helps by not over-provisioning GPUs — it provisions the minimal GPU instance needed. Use on-demand for GPU (Spot is available but has >30% hourly interruption rate). Karpenter consolidation doesn't work across GPU families (can't move a CUDA job from p3 to g4), so GPU nodes remain longer than compute-only nodes.

What's the minimum number of nodes a Karpenter cluster needs?

Karpenter can scale to zero nodes if all pods are evicted, but practically: (1) Seed nodes: Start with 2-3 on-demand nodes for system pods (karpenter, kube-system, coredns) — this baseline costs ~$50-100/month. (2) Workload nodes: Karpenter provisions on-demand or Spot based on workload. A "hello-world" cluster using 1-2 small nodes (t3.small) = ~$15-30/month. (3) Production minimum: 3 nodes for HA (1 per AZ, 1 for system pods). Karpenter scales from there. Small Karpenter clusters cost less than CA clusters because there's no over-provisioning buffer — nodes scale to demand, not to predicted peaks.

How to Deploy EKS with Karpenter for Cost-Optimized Autoscaling

Karpenter is a Kubernetes autoscaler that replaces the legacy Cluster Autoscaler with intelligent node provisioning. Instead of over-provisioning nodes as a buffer, Karpenter watches for unschedulable pods, provisions nodes with exactly the right capacity, and consolidates idle nodes automatically. The result: clusters cost 50-70% less without sacrificing availability.

This guide covers installing Karpenter on EKS, configuring NodePools, enabling consolidation and Spot instances, and deploying production safely.

Scaling Kubernetes on AWS? FactualMinds helps teams architect cost-optimized EKS clusters with Karpenter, multi-AZ HA, and GitOps pipelines. See our AWS serverless & Kubernetes services or talk to our team.

Step 1: Set Up IAM and EKS Prerequisites

Karpenter needs IAM permissions to launch EC2 instances and IRSA (IAM Role for Service Accounts) to authenticate. You have two options: Helm (easiest) or manual IAM setup.

Using Helm Chart to Create IAM

The Karpenter Helm chart includes an optional serviceAccount.annotations parameter that auto-creates the IRSA. This is the recommended path:

# Set AWS account and region
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=us-east-1

# Add Karpenter Helm repo
helm repo add karpenter https://charts.karpenter.sh
helm repo update

# Install Karpenter with IRSA (the chart creates the IAM role and trust relationship)
helm install karpenter karpenter/karpenter \
  --namespace karpenter --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
  --set settings.clusterName=${CLUSTER_NAME} \
  --version v0.37.1  # Pin to stable version

If the role doesn’t exist yet, create it manually:

# Create the IAM role with the correct trust policy
cat > /tmp/trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query 'cluster.identity.oidc.issuer' --output text | cut -d '/' -f 5)"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query 'cluster.identity.oidc.issuer' --output text | cut -d '/' -f 5):sub": "system:serviceaccount:karpenter:karpenter"
        }
      }
    }
  ]
}
EOF

aws iam create-role \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --assume-role-policy-document file:///tmp/trust-policy.json

Attach Permissions

Karpenter needs permissions to launch nodes and manage ASGs:

# Attach the Karpenter policy
aws iam attach-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

# Add the Karpenter-specific policy (in newer versions, this is auto-attached)
cat > /tmp/karpenter-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "ec2:CreateFleet",
        "ec2:CreateLaunchTemplate",
        "ec2:CreateNodePool",
        "ec2:DeleteLaunchTemplate",
        "ec2:DescribeFleets",
        "ec2:DescribeInstances",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeNodePool",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeVpcs",
        "ec2:RunInstances",
        "ec2:TerminateInstances"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}
EOF

aws iam put-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-name KarpenterPolicy \
  --policy-document file:///tmp/karpenter-policy.json

Enable EC2 Spot Service

aws ec2 describe-spot-price-history \
  --start-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --product-descriptions "Linux/UNIX" \
  --max-results 1 \
  --region ${AWS_REGION}

If this fails, enable Spot in your account (one-time):

aws ec2 modify-account-attribute \
  --attribute-name default-vpc \
  --region ${AWS_REGION}

Step 2: Install Karpenter via Helm

helm install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
  --set settings.clusterName=${CLUSTER_NAME} \
  --set settings.aws.defaultSubnets='["subnet-12345", "subnet-67890"]' \
  --set settings.aws.defaultSecurityGroups='["sg-12345"]' \
  --wait

Verify installation:

kubectl get pods -n karpenter
# Expected: karpenter-xxx running
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

Step 3: Create a NodePool

A NodePool defines the instance types, limits, and consolidation policy. Start with this template:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        # Instance families: prefer cheap, general-purpose instances
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ["t3", "m6i", "c6i"]
        # Instance size: allow small to large instances
        - key: node.kubernetes.io/instance-size
          operator: In
          values: ["small", "medium", "large", "xlarge"]
        # Capacity type: 70% Spot, 30% on-demand (Spot has interruption risk)
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        # Architecture: prefer x86_64
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
  limits:
    # Hard limit on total CPU across all nodes in this NodePool
    cpu: "100"
    # Hard limit on total memory
    memory: "100Gi"
  disruption:
    # Consolidation: remove idle nodes automatically
    consolidateAfter: 30s
    # Expire nodes: refresh nodes every 604800 seconds (7 days) for patching
    expireAfter: 604800s
    # Budget: allow up to 25% of nodes to consolidate per minute
    budgets:
      - nodes: "25%"
        duration: 1m
  providerRef:
    name: default
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
  namespace: karpenter
spec:
  amiFamily: AL2
  subnetSelector:
    karpenter.sh/discovery: "true"  # Select subnets tagged with this
  securityGroupSelector:
    karpenter.sh/discovery: "true"  # Select security groups tagged with this
  tags:
    ManagedBy: "karpenter"
    Environment: "production"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        deleteOnTermination: true

Apply this:

kubectl apply -f nodepool.yaml

Tag Your Subnets and Security Groups

Karpenter needs to find subnets and security groups via tags:

# Tag subnets
aws ec2 create-tags \
  --resources subnet-12345 subnet-67890 \
  --tags Key=karpenter.sh/discovery,Value=true

# Tag security groups
aws ec2 create-tags \
  --resources sg-12345 \
  --tags Key=karpenter.sh/discovery,Value=true

Step 4: Test Karpenter with a Deployment

Deploy a test workload to see Karpenter provision nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: karpenter-test
spec:
  replicas: 3
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: app
        image: nginx:latest
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "500m"
            memory: "1Gi"

Apply and watch:

kubectl apply -f test-deployment.yaml
kubectl get nodes -L karpenter.sh/capacity-type
# You should see new nodes appear (provisioned by Karpenter)

Check logs to confirm Karpenter provisioned nodes:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50
# Look for: "Creating instance", "Provisioned node"

Step 5: Optimize Cost with Consolidation

Consolidation is Karpenter’s biggest cost-saving feature. It automatically removes idle nodes.

How Consolidation Works

Karpenter watches for nodes with low utilization
It checks if pods on those nodes can fit on other nodes
If yes, it cordons the old node, drains pods, and terminates it

Configuration

The NodePool above already includes consolidation:

disruption:
  consolidateAfter: 30s  # Wait 30s before attempting consolidation
  budgets:
    - nodes: "25%"       # Consolidate max 25% of nodes per minute
      duration: 1m

Tuning for cost:

consolidateAfter: 30s — aggressive (good for non-stateful workloads)
consolidateAfter: 5m — conservative (good for databases, stateful services)

Tuning for stability:

nodes: "10%" — slow (safer for production, less churn)
nodes: "50%" — fast (for dev/staging)

Monitor Consolidation

# Watch for consolidation events
kubectl get events -n karpenter --sort-by='.lastTimestamp'
# Look for: "Consolidating nodes...", "Terminating nodes..."

# Check current node utilization
kubectl top nodes

Step 6: Optimize Cost Further

1. Use Spot Instances

The NodePool above already includes Spot (70% preference). Spot costs 70-90% less than on-demand:

requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot", "on-demand"]

To use only Spot (risky):

requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot"]

To use only on-demand (expensive but stable):

requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["on-demand"]

2. Right-Size Instance Types

Default NodePool uses t3, m6i, c6i — good for general workloads. For specific needs:

High-CPU workloads (CI/CD, data processing):

- key: node.kubernetes.io/instance-family
  operator: In
  values: ["c6i", "c5"]

High-Memory workloads (databases, caching):

- key: node.kubernetes.io/instance-family
  operator: In
  values: ["r6i", "r5"]

Burstable, cost-sensitive workloads:

- key: node.kubernetes.io/instance-family
  operator: In
  values: ["t4g", "t3"]

3. Set Cluster-Wide CPU and Memory Limits

The NodePool limits prevent runaway provisioning:

limits:
  cpu: "100"        # Stop provisioning when total CPU hits 100
  memory: "100Gi"   # Stop provisioning when total memory hits 100Gi

If you hit these limits, pods remain unschedulable. Monitor in production:

kubectl describe nodepool default -n karpenter | grep -A2 "Limits"

4. Add Pod Priority

Karpenter respects pod priority — critical pods are consolidated last:

apiVersion: v1
kind: PriorityClass
metadata:
  name: critical
value: 1000
globalDefault: false
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-critical-app
spec:
  template:
    spec:
      priorityClassName: critical
      containers:
      - name: app
        image: ...

Critical pods stay on nodes during consolidation; non-critical pods get moved first.

Step 7: Production Safety Patterns

Pattern 1: Pod Disruption Budgets (PDB)

Prevent Karpenter from disrupting too many pods at once:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

This ensures 2 pods of my-app stay available during consolidation.

Pattern 2: Multi-NodePool Strategy

Use separate NodePools for different workload types:

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: general
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ["t3", "m6i"]
      nodeClassRef:
        name: general
  limits:
    cpu: "50"
    memory: "50Gi"
  consolidation:
    consolidateAfter: 30s
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gpu
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ["g4dn"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]  # GPU Spot is unstable
      nodeClassRef:
        name: gpu
  limits:
    cpu: "100"
    memory: "200Gi"
  consolidation:
    consolidateAfter: 5m  # Conservative for stateful GPU jobs

Then schedule workloads to the right pool:

spec:
  nodeSelector:
    karpenter.sh/nodepool: general  # or gpu

Pattern 3: Monitoring

Set up CloudWatch alarms:

# Alert if consolidation isn't running (potential cost waste)
aws cloudwatch put-metric-alarm \
  --alarm-name karpenter-consolidation-stalled \
  --alarm-description "Karpenter consolidation hasn't occurred in 1 hour" \
  --metric-name karpenter_consolidation_actions_performed_total \
  --namespace karpenter \
  --statistic Sum \
  --period 3600 \
  --threshold 0 \
  --comparison-operator LessThanThreshold \
  --evaluation-periods 1

Watch node churn (too much consolidation = instability):

# Check termination rate
kubectl get events -n karpenter | grep -c "Terminating node"

Common Mistakes to Avoid

Over-aggressive consolidation
- Setting consolidateAfter: 1s causes constant node churn
- Use 30s-5m based on workload type
Forgetting Pod Disruption Budgets
- Without PDBs, critical apps might lose quorum during consolidation
- Add PDBs for stateful services (databases, message queues)
Not tagging subnets
- Karpenter won’t find subnets without karpenter.sh/discovery: "true" tags
- Verify tags before troubleshooting provisioning issues
Setting CPU/memory limits too low
- If limits are hit, new pods stay unschedulable
- Monitor kubectl describe nodepool and increase limits as workload grows
Using only Spot instances
- Spot has >30% hourly interruption rate
- Use 70/30 Spot/on-demand split; pin critical services to on-demand

Cost Comparison: Cluster Autoscaler vs Karpenter

For a 50-pod cluster running 2-4 weeks per month (typical batch workload):

Cluster Autoscaler:

Over-provisions 20% buffer: 50 pods × 1.2 = 60 pod slots provisioned
Average 20 nodes of m5.large running 24/7 = $0.192/hr × 20 = $3.84/hr
Monthly: $3.84 × 730 hours = $2,803

Karpenter:

Exactly fits pods: 50 pod slots
Average 17 nodes (same pods, better bin-packing) = $3.27/hr
Monthly: $3.27 × 730 hours = $2,387
Savings: ~$416/month (15%)

Add consolidation (removes idle nodes in off-hours):

Off-peak hours (10pm-8am): cluster scales to 5 nodes
Peak hours: 17 nodes
Average: ~11 nodes = $2.11/hr
Monthly: $2.11 × 730 = $1,540
Savings with consolidation: ~$1,263/month (45%)

Next Steps

Install Karpenter on a dev/staging cluster (2-3 hours)
Test with sample workloads for 1-2 weeks
Monitor consolidation and cost savings
Deploy to production with PDBs and multi-NodePool strategy
Set up CloudWatch alarms for churn and utilization
Talk to FactualMinds if you need help tuning consolidation, capacity planning, or multi-cluster Karpenter deployments

How to Deploy EKS with Karpenter for Cost-Optimized Autoscaling

Step 1: Set Up IAM and EKS Prerequisites

Using Helm Chart to Create IAM

Attach Permissions

Enable EC2 Spot Service

Step 2: Install Karpenter via Helm

Step 3: Create a NodePool

Tag Your Subnets and Security Groups

Step 4: Test Karpenter with a Deployment

Step 5: Optimize Cost with Consolidation

How Consolidation Works

Configuration

Monitor Consolidation

Step 6: Optimize Cost Further

1. Use Spot Instances

2. Right-Size Instance Types

3. Set Cluster-Wide CPU and Memory Limits

4. Add Pod Priority

Step 7: Production Safety Patterns

Pattern 1: Pod Disruption Budgets (PDB)

Pattern 2: Multi-NodePool Strategy

Pattern 3: Monitoring

Common Mistakes to Avoid

Cost Comparison: Cluster Autoscaler vs Karpenter

Next Steps

Ready to discuss your AWS strategy?

Recommended Reading

How to Migrate to AWS Without Cost Surprises

How to Build a Cost-Optimized SaaS Stack on AWS (End-to-End Reference)

How to Build Hybrid Compute (EC2 + Serverless) for Cost Efficiency

How to Design Multi-Region AWS Architectures Without Doubling Costs

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

Step 1: Set Up IAM and EKS Prerequisites

Using Helm Chart to Create IAM

Attach Permissions

Enable EC2 Spot Service

Step 2: Install Karpenter via Helm

Step 3: Create a NodePool

Tag Your Subnets and Security Groups

Step 4: Test Karpenter with a Deployment

Step 5: Optimize Cost with Consolidation

How Consolidation Works

Configuration

Monitor Consolidation

Step 6: Optimize Cost Further

1. Use Spot Instances

2. Right-Size Instance Types

3. Set Cluster-Wide CPU and Memory Limits

4. Add Pod Priority

Step 7: Production Safety Patterns

Pattern 1: Pod Disruption Budgets (PDB)

Pattern 2: Multi-NodePool Strategy

Pattern 3: Monitoring

Common Mistakes to Avoid

Cost Comparison: Cluster Autoscaler vs Karpenter

Next Steps

Ready to discuss your AWS strategy?

Recommended Reading

How to Migrate to AWS Without Cost Surprises

How to Build a Cost-Optimized SaaS Stack on AWS (End-to-End Reference)

How to Build Hybrid Compute (EC2 + Serverless) for Cost Efficiency

How to Design Multi-Region AWS Architectures Without Doubling Costs