AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Karpenter replaces Cluster Autoscaler as the recommended EKS node autoscaler. It provisions nodes faster, selects better-fit instance types per workload, and consolidates nodes more aggressively — typically reducing EKS compute costs by 20-40% compared to an equivalent Cluster Autoscaler deployment

Key Facts

  • Karpenter replaces Cluster Autoscaler as the recommended EKS node autoscaler
  • Karpenter replaces Cluster Autoscaler as the recommended EKS node autoscaler

Entity Definitions

EKS
EKS is an AWS service discussed in this article.
cost optimization
cost optimization is a cloud computing concept discussed in this article.

Karpenter vs Cluster Autoscaler: EKS Node Cost Optimization in 2026

Quick summary: Karpenter replaces Cluster Autoscaler as the recommended EKS node autoscaler. It provisions nodes faster, selects better-fit instance types per workload, and consolidates nodes more aggressively — typically reducing EKS compute costs by 20-40% compared to an equivalent Cluster Autoscaler deployment.

Key Takeaways

  • Karpenter replaces Cluster Autoscaler as the recommended EKS node autoscaler
  • Karpenter replaces Cluster Autoscaler as the recommended EKS node autoscaler
Table of Contents

EKS node autoscaling is one of the highest-leverage cost optimization decisions for Kubernetes workloads. The autoscaler you choose determines how efficiently your cluster converts pending pods into running compute, how aggressively it reclaims idle capacity, and how well it handles Spot instance economics.

AWS recommends Karpenter as the current best practice for EKS node provisioning. This guide explains what Karpenter does differently from Cluster Autoscaler, how to quantify the cost benefit, and how to migrate.

What Cluster Autoscaler Gets Wrong (From a Cost Perspective)

Cluster Autoscaler (CA) works with node groups — pre-configured sets of EC2 instances with a fixed instance type, launch template, and scaling bounds. When pods are pending due to insufficient capacity, CA adds nodes from the appropriate node group. When nodes are underutilized, CA can scale in.

The cost limitations:

1. Nodes only terminate when fully empty. CA’s scale-in logic removes a node when all of its pods can be safely rescheduled elsewhere. This means a node at 15% utilization with one non-migratable daemonset pod stays running indefinitely. In practice, large clusters accumulate dozens of partially utilized nodes that CA cannot consolidate.

2. Fixed instance types per node group. A team running a CPU-intensive batch job and a memory-intensive analytics job uses separate node groups with separate fixed instance types. The CPU job gets whatever instance type the CPU node group specifies, even if a different instance type in the same family would be 20% cheaper for the actual CPU-to-memory ratio the workload uses.

3. Slow provisioning. CA calls the Auto Scaling Group API to add a node. The ASG then launches the EC2 instance (1–3 minutes), the instance joins the cluster, kubelet registers, and pods begin scheduling. The total time from pending pod to running pod is typically 3–5 minutes. During that window, the scheduler may mark pods as unschedulable and generate alerts.

How Karpenter Improves on Each Limitation

Consolidation: The Core Cost Benefit

Karpenter’s consolidation controller continuously evaluates whether any nodes are underutilized to the point where their workloads could be moved to other nodes. When it identifies consolidation opportunities:

  1. Karpenter selects pods to migrate off the underutilized node
  2. It cordons and drains the node, rescheduling pods elsewhere
  3. It terminates the now-empty node

This active consolidation removes the “partially utilized but not empty” node problem. A cluster where Cluster Autoscaler maintains 30 nodes, 8 of which are at 15–20% utilization, will often converge to 22–24 nodes under Karpenter — with the same workload running at the same performance level.

Observed cost impact: Teams migrating from CA to Karpenter on clusters with variable workloads consistently report 20–40% reduction in EC2 node costs. The exact savings depend on how spiky your workload is and how much partial-utilization accumulation your CA cluster has built up.

Right-Sized Instance Selection

Karpenter uses NodePool resources that specify a list of allowed instance types (or instance families) rather than a single fixed type. When scheduling a pending pod, Karpenter:

  1. Evaluates the pod’s resource requests and node selectors
  2. Filters the allowed instance list to types that satisfy requirements
  3. Selects the most cost-efficient option considering current Spot pricing across types and AZs
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
spec:
  template:
    spec:
      requirements:
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]  # compute, memory, general — flexible
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]  # At least 6th generation for Graviton efficiency
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]  # Prefer Graviton — 20% cheaper per vCPU
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]  # Prefer Spot, fall back to On-Demand
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Allowing multiple instance families and generations means Karpenter can choose a c7g.large over an m6a.large when the workload is CPU-bound — saving the cost difference between the two families. This flexibility is what drives the 15–30% instance cost reduction on top of the consolidation savings.

Node Provisioning Speed

Karpenter calls the EC2 CreateFleet API directly instead of going through Auto Scaling Groups. This bypasses ASG warm pool management and typically halves the provisioning time — pods go from pending to running in 1–2 minutes rather than 3–5 minutes. Faster provisioning means less queued workload and better utilization of the nodes that do get provisioned.

Spot Instance Economics with Karpenter

Spot instances on EKS have historically required careful management: separate node groups per Spot pool, manual capacity type diversification, and custom handling for Spot interruptions. Karpenter handles this natively.

Spot Fleet Configuration

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-workers
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-cpu
          operator: In
          values: ["4", "8", "16"]  # Multiple sizes for diversification
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
    budgets:
      - nodes: "10%"  # Don't disrupt more than 10% of nodes at once

By allowing multiple instance categories and sizes, Karpenter selects from a large Spot pool — improving availability and reducing the probability of interruptions across the whole fleet.

Interruption Handling

When a Spot interruption notice arrives (2 minutes before termination), Karpenter:

  1. Receives the interruption notification via EC2 Instance Connect or EventBridge
  2. Immediately cordons the node to prevent new pod scheduling
  3. Evicts pods gracefully with respect to PodDisruptionBudgets
  4. Launches replacement nodes proactively

This automated interruption handling removes the need for custom Spot termination scripts or third-party tools like the AWS Node Termination Handler (though the handler is still recommended as a safety layer).

Migrating from Cluster Autoscaler to Karpenter

Step 1: Install Karpenter alongside CA

Karpenter and Cluster Autoscaler can run simultaneously. Install Karpenter but configure it with taints on any new NodePool resources so existing workloads do not migrate immediately:

spec:
  template:
    metadata:
      taints:
        - key: karpenter.sh/managed
          effect: NoSchedule

Step 2: Migrate one workload at a time

Add toleration for the Karpenter taint to one workload at a time, observing cost and stability:

# In your Deployment spec
tolerations:
  - key: karpenter.sh/managed
    effect: NoSchedule

Step 3: Enable consolidation on migrated workloads

Once satisfied with Karpenter stability, enable consolidation:

disruption:
  consolidationPolicy: WhenEmptyOrUnderutilized
  consolidateAfter: 30s

Step 4: Disable CA for migrated node groups

Once all workloads are managed by Karpenter, set CA node group min/max to 0 and remove the CA deployment.

Measuring the Cost Impact

Before and after migration, measure:

  • Total node count — Same workload, fewer nodes indicates consolidation working
  • Average node utilization — Should increase as idle nodes are removed
  • EC2 cost per pod — Divide EC2 cost by running pod count; should decrease
  • Scale-out latency p99 — Should improve with faster Karpenter provisioning

In Cost Explorer, filter by EC2 and tag by EKS cluster name before and after migration to isolate the savings.

When Cluster Autoscaler Is Still Appropriate

Karpenter is the right choice for most EKS workloads. CA may still be appropriate when:

  • Your organization has strict change management constraints and CA is already certified
  • You are running a highly specialized workload with a known fixed instance type and no benefit from dynamic selection
  • You are on an older EKS version where Karpenter v1 is not yet supported (check Karpenter release compatibility)

For new EKS clusters in 2026, Karpenter should be the default choice.

Getting Started

For EKS cost optimization including Karpenter installation, NodePool configuration, and cost baseline measurement, our team provides AWS managed Kubernetes services and cloud cost optimization consulting.

For the broader architecture and cost trade-off analysis between EKS and other AWS compute options, see our AWS cost control architecture playbook.

Contact us to optimize your EKS costs →

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »
Autoscaling Broke Your Budget (AI Made It Worse)

Autoscaling Broke Your Budget (AI Made It Worse)

Autoscaling was supposed to make costs predictable by matching capacity to demand. Instead, it introduced feedback loops, burst amplification, and — with AI workloads — a new class of non-deterministic spend that no scaling policy anticipates.

Logging Yourself Into Bankruptcy

Logging Yourself Into Bankruptcy

Observability is not free, and the industry has collectively underpriced it. CloudWatch log ingestion, metrics explosion, and X-Ray trace volume can together exceed your compute bill — especially once AI workloads introduce high-cardinality telemetry at scale.