---
title: How to Deploy EKS with Karpenter for Cost-Optimized Autoscaling
description: Karpenter replaces Kubernetes Cluster Autoscaler with intelligent bin-packing and just-in-time node provisioning. This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters.
url: https://www.factualminds.com/blog/how-to-deploy-eks-karpenter-cost-optimized-autoscaling/
datePublished: 2026-04-03T00:00:00.000Z
dateModified: 2026-04-16T00:00:00.000Z
author: Palaniappan P
category: Cloud Architecture
tags: how-to-guide, karpenter, kubernetes, eks, cost-optimization, autoscaling, aws
---

# How to Deploy EKS with Karpenter for Cost-Optimized Autoscaling

> Karpenter replaces Kubernetes Cluster Autoscaler with intelligent bin-packing and just-in-time node provisioning. This guide covers setup, consolidation, cost optimization, and production patterns for EKS clusters.

Karpenter is a Kubernetes autoscaler that replaces the legacy Cluster Autoscaler with intelligent node provisioning. Instead of over-provisioning nodes as a buffer, Karpenter watches for unschedulable pods, provisions nodes with exactly the right capacity, and consolidates idle nodes automatically. The result: clusters cost 50-70% less without sacrificing availability.

This guide covers installing Karpenter on EKS, configuring NodePools, enabling consolidation and Spot instances, and deploying production safely.

> **Scaling Kubernetes on AWS?** FactualMinds helps teams architect cost-optimized EKS clusters with Karpenter, multi-AZ HA, and GitOps pipelines. [See our AWS serverless & Kubernetes services](/services/aws-serverless/) or [talk to our team](/contact-us/).

## Step 1: Set Up IAM and EKS Prerequisites

Karpenter needs IAM permissions to launch EC2 instances and IRSA (IAM Role for Service Accounts) to authenticate. You have two options: **Helm (easiest)** or **manual IAM setup**.

### Using Helm Chart to Create IAM

The Karpenter Helm chart includes an optional `serviceAccount.annotations` parameter that auto-creates the IRSA. This is the recommended path:

```bash
# Set AWS account and region
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=us-east-1

# Add Karpenter Helm repo (using OCI registry for current releases)
helm repo add karpenter oci://public.ecr.aws/karpenter/karpenter
helm repo update

# Install Karpenter with IRSA (the chart creates the IAM role and trust relationship)
helm install karpenter karpenter/karpenter \
  --namespace karpenter --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
  --set settings.clusterName=${CLUSTER_NAME} \
  --version 1.4.0  # Pin to current stable v1.x — verify latest at github.com/aws/karpenter
```

If the role doesn't exist yet, create it manually:

```bash
# Create the IAM role with the correct trust policy
cat > /tmp/trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query 'cluster.identity.oidc.issuer' --output text | cut -d '/' -f 5)"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} --query 'cluster.identity.oidc.issuer' --output text | cut -d '/' -f 5):sub": "system:serviceaccount:karpenter:karpenter"
        }
      }
    }
  ]
}
EOF

aws iam create-role \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --assume-role-policy-document file:///tmp/trust-policy.json
```

### Attach Permissions

Karpenter needs permissions to launch nodes and manage ASGs:

```bash
# Attach the Karpenter policy
aws iam attach-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

# Add the Karpenter-specific policy (in newer versions, this is auto-attached)
cat > /tmp/karpenter-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "ec2:CreateFleet",
        "ec2:CreateLaunchTemplate",
        "ec2:CreateNodePool",
        "ec2:DeleteLaunchTemplate",
        "ec2:DescribeFleets",
        "ec2:DescribeInstances",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeNodePool",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeVpcs",
        "ec2:RunInstances",
        "ec2:TerminateInstances"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}
EOF

aws iam put-role-policy \
  --role-name KarpenterNodeRole-${CLUSTER_NAME} \
  --policy-name KarpenterPolicy \
  --policy-document file:///tmp/karpenter-policy.json
```

### Enable EC2 Spot Service

```bash
aws ec2 describe-spot-price-history \
  --start-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --product-descriptions "Linux/UNIX" \
  --max-results 1 \
  --region ${AWS_REGION}
```

If this fails, enable Spot in your account (one-time):

```bash
aws ec2 modify-account-attribute \
  --attribute-name default-vpc \
  --region ${AWS_REGION}
```

## Step 2: Install Karpenter via Helm

```bash
helm install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}" \
  --set settings.clusterName=${CLUSTER_NAME} \
  --set settings.aws.defaultSubnets='["subnet-12345", "subnet-67890"]' \
  --set settings.aws.defaultSecurityGroups='["sg-12345"]' \
  --wait
```

Verify installation:

```bash
kubectl get pods -n karpenter
# Expected: karpenter-xxx running
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f
```

## Step 3: Create a NodePool

A NodePool defines the instance types, limits, and consolidation policy. Start with this template:

```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        # Instance families: prefer cheap, general-purpose instances
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ['t3', 'm6i', 'c6i']
        # Instance size: allow small to large instances
        - key: node.kubernetes.io/instance-size
          operator: In
          values: ['small', 'medium', 'large', 'xlarge']
        # Capacity type: 70% Spot, 30% on-demand (Spot has interruption risk)
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['spot', 'on-demand']
        # Architecture: prefer x86_64
        - key: kubernetes.io/arch
          operator: In
          values: ['amd64']
  limits:
    # Hard limit on total CPU across all nodes in this NodePool
    cpu: '100'
    # Hard limit on total memory
    memory: '100Gi'
  disruption:
    # Consolidation: remove idle nodes automatically
    consolidateAfter: 30s
    # Expire nodes: refresh nodes every 604800 seconds (7 days) for patching
    expireAfter: 604800s
    # Budget: allow up to 25% of nodes to consolidate per minute
    budgets:
      - nodes: '25%'
        duration: 1m
  providerRef:
    name: default
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
  namespace: karpenter
spec:
  amiFamily: AL2023
  subnetSelector:
    karpenter.sh/discovery: 'true' # Select subnets tagged with this
  securityGroupSelector:
    karpenter.sh/discovery: 'true' # Select security groups tagged with this
  tags:
    ManagedBy: 'karpenter'
    Environment: 'production'
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        deleteOnTermination: true
```

Apply this:

```bash
kubectl apply -f nodepool.yaml
```

### Tag Your Subnets and Security Groups

Karpenter needs to find subnets and security groups via tags:

```bash
# Tag subnets
aws ec2 create-tags \
  --resources subnet-12345 subnet-67890 \
  --tags Key=karpenter.sh/discovery,Value=true

# Tag security groups
aws ec2 create-tags \
  --resources sg-12345 \
  --tags Key=karpenter.sh/discovery,Value=true
```

## Step 4: Test Karpenter with a Deployment

Deploy a test workload to see Karpenter provision nodes:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: karpenter-test
spec:
  replicas: 3
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
        - name: app
          image: nginx:latest
          resources:
            requests:
              cpu: '250m'
              memory: '512Mi'
            limits:
              cpu: '500m'
              memory: '1Gi'
```

Apply and watch:

```bash
kubectl apply -f test-deployment.yaml
kubectl get nodes -L karpenter.sh/capacity-type
# You should see new nodes appear (provisioned by Karpenter)
```

Check logs to confirm Karpenter provisioned nodes:

```bash
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50
# Look for: "Creating instance", "Provisioned node"
```

## Step 5: Optimize Cost with Consolidation

Consolidation is Karpenter's biggest cost-saving feature. It automatically removes idle nodes.

### How Consolidation Works

1. Karpenter watches for nodes with low utilization
2. It checks if pods on those nodes can fit on other nodes
3. If yes, it cordons the old node, drains pods, and terminates it

### Configuration

The NodePool above already includes consolidation:

```yaml
disruption:
  consolidateAfter: 30s # Wait 30s before attempting consolidation
  budgets:
    - nodes: '25%' # Consolidate max 25% of nodes per minute
      duration: 1m
```

**Tuning for cost**:

- `consolidateAfter: 30s` — aggressive (good for non-stateful workloads)
- `consolidateAfter: 5m` — conservative (good for databases, stateful services)

**Tuning for stability**:

- `nodes: "10%"` — slow (safer for production, less churn)
- `nodes: "50%"` — fast (for dev/staging)

### Monitor Consolidation

```bash
# Watch for consolidation events
kubectl get events -n karpenter --sort-by='.lastTimestamp'
# Look for: "Consolidating nodes...", "Terminating nodes..."

# Check current node utilization
kubectl top nodes
```

## Step 6: Optimize Cost Further

### 1. Use Spot Instances

The NodePool above already includes Spot (70% preference). Spot costs 70-90% less than on-demand:

```yaml
requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ['spot', 'on-demand']
```

To use only Spot (risky):

```yaml
requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ['spot']
```

To use only on-demand (expensive but stable):

```yaml
requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ['on-demand']
```

### 2. Right-Size Instance Types

Default NodePool uses t3, m6i, c6i — good for general workloads. For specific needs:

**High-CPU workloads** (CI/CD, data processing):

```yaml
- key: node.kubernetes.io/instance-family
  operator: In
  values: ['c6i', 'c5']
```

**High-Memory workloads** (databases, caching):

```yaml
- key: node.kubernetes.io/instance-family
  operator: In
  values: ['r6i', 'r5']
```

**Burstable, cost-sensitive workloads**:

```yaml
- key: node.kubernetes.io/instance-family
  operator: In
  values: ['t4g', 't3']
```

### 3. Set Cluster-Wide CPU and Memory Limits

The NodePool `limits` prevent runaway provisioning:

```yaml
limits:
  cpu: '100' # Stop provisioning when total CPU hits 100
  memory: '100Gi' # Stop provisioning when total memory hits 100Gi
```

If you hit these limits, pods remain unschedulable. Monitor in production:

```bash
kubectl describe nodepool default -n karpenter | grep -A2 "Limits"
```

### 4. Add Pod Priority

Karpenter respects pod priority — critical pods are consolidated last:

```yaml
apiVersion: v1
kind: PriorityClass
metadata:
  name: critical
value: 1000
globalDefault: false
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-critical-app
spec:
  template:
    spec:
      priorityClassName: critical
      containers:
        - name: app
          image: ...
```

Critical pods stay on nodes during consolidation; non-critical pods get moved first.

## Step 7: Production Safety Patterns

### Pattern 1: Pod Disruption Budgets (PDB)

Prevent Karpenter from disrupting too many pods at once:

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app
```

This ensures 2 pods of `my-app` stay available during consolidation.

### Pattern 2: Multi-NodePool Strategy

Use separate NodePools for different workload types:

```yaml
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ['t3', 'm6i']
      nodeClassRef:
        name: general
  limits:
    cpu: '50'
    memory: '50Gi'
  consolidation:
    consolidateAfter: 30s
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
  namespace: karpenter
spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-family
          operator: In
          values: ['g4dn']
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['on-demand'] # GPU Spot is unstable
      nodeClassRef:
        name: gpu
  limits:
    cpu: '100'
    memory: '200Gi'
  consolidation:
    consolidateAfter: 5m # Conservative for stateful GPU jobs
```

Then schedule workloads to the right pool:

```yaml
spec:
  nodeSelector:
    karpenter.sh/nodepool: general # or gpu
```

### Pattern 3: Monitoring

Set up CloudWatch alarms:

```bash
# Alert if consolidation isn't running (potential cost waste)
aws cloudwatch put-metric-alarm \
  --alarm-name karpenter-consolidation-stalled \
  --alarm-description "Karpenter consolidation hasn't occurred in 1 hour" \
  --metric-name karpenter_consolidation_actions_performed_total \
  --namespace karpenter \
  --statistic Sum \
  --period 3600 \
  --threshold 0 \
  --comparison-operator LessThanThreshold \
  --evaluation-periods 1
```

Watch node churn (too much consolidation = instability):

```bash
# Check termination rate
kubectl get events -n karpenter | grep -c "Terminating node"
```

## Common Mistakes to Avoid

1. **Over-aggressive consolidation**
   - Setting `consolidateAfter: 1s` causes constant node churn
   - Use 30s-5m based on workload type

2. **Forgetting Pod Disruption Budgets**
   - Without PDBs, critical apps might lose quorum during consolidation
   - Add PDBs for stateful services (databases, message queues)

3. **Not tagging subnets**
   - Karpenter won't find subnets without `karpenter.sh/discovery: "true"` tags
   - Verify tags before troubleshooting provisioning issues

4. **Setting CPU/memory limits too low**
   - If limits are hit, new pods stay unschedulable
   - Monitor `kubectl describe nodepool` and increase limits as workload grows

5. **Using only Spot instances**
   - Spot has >30% hourly interruption rate
   - Use 70/30 Spot/on-demand split; pin critical services to on-demand

## Cost Comparison: Cluster Autoscaler vs Karpenter

For a 50-pod cluster running 2-4 weeks per month (typical batch workload):

**Cluster Autoscaler**:

- Over-provisions 20% buffer: 50 pods × 1.2 = 60 pod slots provisioned
- Average 20 nodes of m5.large running 24/7 = $0.192/hr × 20 = $3.84/hr
- Monthly: $3.84 × 730 hours = $2,803

**Karpenter**:

- Exactly fits pods: 50 pod slots
- Average 17 nodes (same pods, better bin-packing) = $3.27/hr
- Monthly: $3.27 × 730 hours = $2,387
- **Savings: ~$416/month (15%)**

Add consolidation (removes idle nodes in off-hours):

- Off-peak hours (10pm-8am): cluster scales to 5 nodes
- Peak hours: 17 nodes
- Average: ~11 nodes = $2.11/hr
- Monthly: $2.11 × 730 = $1,540
- **Savings with consolidation: ~$1,263/month (45%)**

## Next Steps

1. Install Karpenter on a dev/staging cluster (2-3 hours)
2. Test with sample workloads for 1-2 weeks
3. Monitor consolidation and cost savings
4. Deploy to production with PDBs and multi-NodePool strategy
5. Set up CloudWatch alarms for churn and utilization
6. [Talk to FactualMinds](/contact-us/) if you need help tuning consolidation, capacity planning, or multi-cluster Karpenter deployments

## FAQ

### What is the difference between Cluster Autoscaler and Karpenter?
Cluster Autoscaler (CA) watches for unschedulable pods, then scales up nodes reactively. Karpenter watches for unschedulable pods, then *provisions* nodes with exactly the right capacity, eliminating wasted over-provisioning. Example: a pod needs 1 vCPU + 2GB RAM. CA might launch an m5.xlarge (4vCPU, 16GB, $0.192/hr). Karpenter launches the smallest instance that fits (e.g., t3.small, 2vCPU, 2GB, $0.016/hr) — 10x cheaper. Plus, Karpenter consolidates idle nodes automatically, CA requires manual configuration. Result: Karpenter reduces idle capacity, bin-packs efficiently, and removes expensive over-provisioning.

### Does Karpenter work with spot instances?
Yes. Karpenter provisions both on-demand and Spot instances, automatically falling back to on-demand if Spot is unavailable. You define a "Spot + on-demand" consolidation policy, and Karpenter picks the cheapest option. For example: a NodePool might allow t3, m6i, c6i families at 70% Spot preference — Karpenter tries Spot first, falls back to on-demand if interrupted. Spot can reduce costs by 70-90% vs on-demand, but carry interruption risk. Use for fault-tolerant workloads (batch, stateless services); pin on-demand for stateful/critical services.

### What is consolidation and why does it matter for cost?
Consolidation removes nodes that aren't needed. After scaling down (e.g., a job completes and pods terminate), Karpenter checks if remaining pods fit on fewer nodes. If they do, it cordons old nodes, drains pods to remaining nodes, and terminates the empty ones. This happens automatically in the background. Example: after a batch job finishes, you had 10 nodes running 20 pods. After consolidation, those 20 pods fit on 5 nodes. Karpenter terminates the other 5, saving ~$1/hour. Without consolidation (like CA), those idle nodes stay running indefinitely. Karpenter's consolidation is the biggest cost savings lever — it keeps your cluster rightsized continuously.

### Does Karpenter support GPU workloads?
Yes. Karpenter can provision GPU nodes (p3, g4, g5 instances) and handles GPU scheduling. You define a NodePool with GPU requirements: Karpenter will provision g4dn.xlarge (1x T4 GPU) or larger when a pod requests GPU. Cost: GPUs are expensive (~$0.35-$1.06/hour for a T4 GPU alone). Karpenter helps by not over-provisioning GPUs — it provisions the minimal GPU instance needed. Use on-demand for GPU (Spot is available but has >30% hourly interruption rate). Karpenter consolidation doesn't work across GPU families (can't move a CUDA job from p3 to g4), so GPU nodes remain longer than compute-only nodes.

### What's the minimum number of nodes a Karpenter cluster needs?
Karpenter can scale to zero nodes if all pods are evicted, but practically: (1) Seed nodes: Start with 2-3 on-demand nodes for system pods (karpenter, kube-system, coredns) — this baseline costs ~$50-100/month. (2) Workload nodes: Karpenter provisions on-demand or Spot based on workload. A "hello-world" cluster using 1-2 small nodes (t3.small) = ~$15-30/month. (3) Production minimum: 3 nodes for HA (1 per AZ, 1 for system pods). Karpenter scales from there. Small Karpenter clusters cost less than CA clusters because there's no over-provisioning buffer — nodes scale to demand, not to predicted peaks.

---

*Source: https://www.factualminds.com/blog/how-to-deploy-eks-karpenter-cost-optimized-autoscaling/*