Container Orchestration

Kubernetes on AWS (EKS)

Managed Kubernetes on AWS with Auto Mode, Hybrid Nodes, Karpenter 1.0, and Graviton-first node pools.

Last updated: April 29, 2026Containers & KubernetesAuthor: FactualMinds Cloud Integration TeamReviewed by: FactualMinds AWS-certified architects (Solutions Architect – Professional)

Ask AI:ChatGPT Claude Perplexity Gemini

Amazon EKS overview

Amazon EKS is AWS-managed Kubernetes. The control plane (API server, scheduler, etcd) is operated by AWS, patched automatically, and deployed across at least three availability zones. You own the data plane — or, on EKS Auto Mode (GA November 2024), you delegate the data plane to AWS as well and consume Kubernetes as an almost-serverless service.

FactualMinds deploys EKS for teams that need Kubernetes portability (multi-cloud, on-prem via EKS Hybrid Nodes, or open-source ecosystem alignment) and for mid-market AWS-only teams that have outgrown ECS or plain Fargate. We default new 2026 clusters to Auto Mode on Kubernetes 1.32 with Graviton-first node pools unless a specific workload says otherwise.

What’s new on EKS in 2026

EKS Auto Mode GA — fully managed data plane, managed add-ons (VPC CNI, kube-proxy, CoreDNS, EBS CSI, AWS Load Balancer Controller), and a managed Karpenter that provisions nodes within seconds of scheduling pressure.
EKS Hybrid Nodes (GA November 2024) — register Linux hosts running on-prem or at the edge as EKS worker nodes governed by an AWS-hosted control plane. One kubectl surface for cloud and hybrid.
Karpenter 1.0 (2024) — stable NodeClass/NodePool CRDs, disruption budgets, and consolidation-policy modes. Karpenter is the default on Auto Mode.
Pod Identity — the ergonomic replacement for IRSA. No OIDC provider, no ServiceAccount annotation, no trust-policy gymnastics.
Kubernetes 1.31 / 1.32 — typical supported minor versions on EKS in 2026; upstream releases every ~4 months, EKS supports the current plus the previous three.
ECR enhanced scanning — Inspector v2 scans images for OS and language-package CVEs with exploit-probability-index scoring; integrates with Security Hub.
AWS Load Balancer Controller — managed install on Auto Mode; supports Gateway API, ALB and NLB target-group binding, and cross-zone health checks.
Amazon EBS CSI driver managed add-on — Auto Mode handles install and upgrade; gp3 volumes by default.
Cilium + Hubble / eBPF observability — supported via add-ons for teams that need deep network visibility without full-fat service mesh.

Why EKS

Kubernetes standard

Standard kubectl, Helm, Kustomize, and standard manifests.
Portable: workloads run on other clouds, on-premises (EKS Hybrid Nodes or EKS Anywhere), or upstream Kubernetes.
Massive ecosystem (Prometheus, OpenTelemetry, Argo CD, Flux, Karpenter, Cilium, Istio, Linkerd).

AWS integration

VPC CNI for pod networking with real AWS IP addresses.
Pod Identity for pod-level IAM permissions without OIDC acrobatics.
Native integrations with ALB/NLB, EFS, EBS, S3, RDS, DynamoDB, SQS, Kinesis, Bedrock.
AWS Security Hub / GuardDuty EKS Protection for runtime threat detection.

Managed control plane

Multi-AZ control plane included in the $0.10/hour price.
AWS patches the control plane on a published minor-version cadence.
SLA covers control-plane availability; you are responsible for workload availability.

EKS Architecture

Control plane (AWS managed)

API server, scheduler, controller managers, etcd.
Audit logs can be shipped to CloudWatch Logs; control-plane endpoints can be private, public, or public+private with IP allow-list.

Data plane (your choice)

EKS Auto Mode — fully managed nodes, add-ons, networking, load balancing, and storage controllers.
Managed node groups — EC2 instances you provision; AWS manages OS patching, drain, and replacement.
Karpenter on self-managed nodes — for teams that want fine control over instance-type selection and disruption policy.
AWS Fargate — serverless pods with no node management; higher per-pod price, best for bursty or sandbox workloads.

Networking

AWS VPC CNI: each pod gets a real VPC IP (prefix delegation supported for IP density).
Security groups for pods (SGFP) for per-pod network security.
Cilium eBPF or Calico for network policy and observability.
AWS Load Balancer Controller for ALB/NLB ingress.

EKS Auto Mode in practice

AWS provisions, scales, and replaces nodes automatically based on pending pods.
Managed Karpenter bin-packs across instance types, including Graviton by default.
OS patching via node replacement on a rolling schedule; no in-place kernel updates.
AWS manages the core add-ons (VPC CNI, kube-proxy, CoreDNS, AWS Load Balancer Controller, EBS CSI).
Billed as EC2 + a small EKS Auto Mode management fee per vCPU-hour; typically net-neutral or cheaper versus self-managed node groups when labor is priced in.

Use Auto Mode when

You want Kubernetes without node operations.
Your security team can live with AWS-managed, regularly replaced AMIs.
Your workloads do not require custom kernel modules or niche runtime options.

Prefer managed node groups when

You need a regulated/approved AMI (e.g., STIG-hardened) maintained by your security team.
You run custom kernel modules (BPF/eBPF extensions beyond what’s supported, niche drivers).
You want fine-grained Spot pool control that the managed NodePool does not expose.

EKS Hybrid Nodes

Register on-prem Linux hosts as EKS workers against an AWS-hosted control plane.
Supports x86 and ARM; requires AWS Systems Manager connectivity from the on-prem host.
Use for edge compute, data-gravity on-prem workloads, or manufacturing floor nodes that must stay physically on site but should be governed from AWS.
Compare to EKS Anywhere: Hybrid Nodes share one control plane with AWS; EKS Anywhere runs its own on-site control plane.

Pod Identity vs IRSA

Pod Identity (2026 default)

aws eks create-pod-identity-association \
  --cluster-name my-cluster \
  --namespace production \
  --service-account my-app \
  --role-arn arn:aws:iam::123456789:role/my-app-role

Pods using the my-app ServiceAccount in the production namespace automatically receive temporary credentials via the Pod Identity Agent. No annotation, no OIDC provider, no trust-policy StringEquals dance.

IRSA (legacy / niche)

Still required for workloads that only accept token-file authentication, clusters older than 1.24, or EC2 workloads outside EKS.
OIDC provider + annotated ServiceAccount + trust-policy condition on the OIDC subject.

Karpenter 1.0 patterns we deploy

Graviton-first NodePool — allow arm64 architectures, prefer on-demand for baseline and Spot for scale-out.
Consolidation policy — WhenUnderutilized for dev/staging, WhenEmpty for production to avoid disruption of long-running pods.
Disruption budgets — cap how many nodes Karpenter can consolidate per hour, aligned with PDB.
Per-namespace NodePool selection — heavy GPU workloads go to a dedicated NodePool with nvidia.com/gpu taints.

Observability stack

CloudWatch Container Insights (enhanced observability) for the cluster, nodes, pods, and control plane.
ADOT Collector DaemonSet forwarding traces, metrics, and logs to Managed Prometheus + Managed Grafana, or to Datadog / New Relic / Honeycomb.
Cilium Hubble (or Pixie) for eBPF-level network visibility without a service mesh.
EKS audit logs to CloudWatch Logs with 90-day retention and S3 archive behind Object Lock for SOC 2 / PCI evidence.

Graviton cost savings

Graviton3 (m7g, c7g, r7g) and Graviton4 (m8g, c8g, r8g) typically deliver 30–40% better price-performance than comparable x86 for stateless microservices and JVM workloads.
Build multi-arch images with docker buildx build --platform linux/amd64,linux/arm64 in CI; push both manifests to ECR.
Karpenter on Auto Mode will pick ARM when it wins on price and pod fits.

Reference architecture (2026 default)

                    ┌──────────────────────────────────────────────┐
                    │  AWS-managed control plane (multi-AZ)        │
                    │  api / scheduler / controller-mgr / etcd     │
                    │  audit + authenticator + scheduler logs      │
                    └─────────────────┬────────────────────────────┘
                                      │ (private endpoint via PrivateLink)
                                      │
   ┌──────────────────────────────────┼──────────────────────────────────┐
   │ Data plane (Auto Mode)           │                                  │
   │  ├── managed Karpenter NodePool  │ ── Pod Identity Agent (per node) │
   │  ├── Graviton-first c8g/m8g/r8g  │ ── VPC CNI (prefix delegation)   │
   │  ├── consolidation policy        │ ── EBS CSI (gp3 default)         │
   │  └── disruption budgets          │ ── AWS LB Controller (ALB+NLB)   │
   └──────────────────────────────────┴──────────────────────────────────┘
                                      │
   Workloads ── ServiceAccount → PodIdentityAssociation → IAM Role
   Ingress  ── ALB (alb.ingress.k8s.aws/scheme: internet-facing)
   Storage  ── EBS gp3 PVCs / EFS for shared / S3 for objects
   Secrets  ── Secrets Store CSI / HashiCorp VSO → Vault / Secrets Manager
   Images   ── ECR (enhanced scanning, image signing) ← CI attestation
   Telemetry ─ CloudWatch Container Insights + ADOT → Datadog / AMP+AMG
   Audit    ── CloudWatch Logs (90d) + S3 Object Lock (compliance archive)

Failure modes & resilience

1. Karpenter consolidation evicting under-budgeted pods. Default consolidationPolicy: WhenUnderutilized will move pods aggressively. For long-running stateful workloads, set WhenEmpty on the NodePool and define a PodDisruptionBudget (minAvailable) so consolidation cannot violate availability. Disruption budgets at the NodePool level cap voluntary disruptions per hour.

2. Pod Identity Agent crash-loop. Symptom: pods using the ServiceAccount get 403 AccessDenied from STS. Causes: agent DaemonSet pod CrashLoopBackOff (check kubectl logs -n kube-system -l app=eks-pod-identity-agent), Pod Identity Association pointing at a non-existent IAM role, trust policy missing pods.eks.amazonaws.com principal, or IMDS hop limit too low on the node. Auto Mode handles the agent; on managed node groups confirm the agent add-on is healthy.

3. NodePool pinned to a single AZ. A zonal disruption (control-plane outage in one AZ, ELB endpoint flap) takes the workload with it. Always include topology.kubernetes.io/zone In [a, b, c] in NodePool requirements; combine with topologySpreadConstraints on Deployments.

4. gp3 volume detach during node replacement. Auto Mode replaces nodes — StatefulSets with volumeClaimTemplates should explicitly set persistentVolumeReclaimPolicy: Retain and a storageClass with volumeBindingMode: WaitForFirstConsumer. Otherwise an in-flight reschedule can race with detach and the pod stays ContainerCreating for several minutes.

5. --max-unavailable vs PDB collisions. A Deployment’s RollingUpdate strategy plus a strict PDB (minAvailable: 100%) deadlocks the rollout. Always set PDB minAvailable such that replicas - minAvailable >= maxUnavailable.

6. Cluster Autoscaler vs Karpenter coexistence. Running both in the same cluster causes thrash. Pick one. Karpenter for new clusters; Cluster Autoscaler only if a vendor product hard-requires it.

7. EKS minor-version upgrade window. AWS supports current + 3 prior minors (~14 months). Letting a cluster slip to N-4 forces emergency upgrade across multiple breaking changes. Schedule quarterly minor upgrades; test in a staging cluster first.

Observability runbook

Enable control-plane logs at cluster creation:

aws eks update-cluster-config \
  --region eu-west-1 \
  --name my-cluster \
  --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'

Alarms we ship:

Alarm	First action
`cluster_failed_request_count > 0` (control plane)	Check audit logs for `Forbidden` / `Unauthorized` patterns; review IAM Identity mappings
`node_status_condition` Ready=false on any node	`kubectl describe node`; check kubelet, CNI, and SSM agent health
Karpenter `nodeclaim_disruption_total` spike	Inspect NodePool consolidation events; verify PDBs are honored
`pod_pending_count > 0` for `> 5 min`	`kubectl describe pod` → events; NodePool requirements vs pod tolerations / arch mismatch
ECR image-pull error rate	VPC endpoint health for `com.amazonaws.<region>.ecr.dkr`; IAM role `ecr:GetAuthorizationToken`
ADOT Collector `otelcol_exporter_send_failed_metric_points`	Backend (AMP / Datadog) reachability; collector resource limits

Debug path: “Pod stuck Pending”:

kubectl describe pod <name> → Events. Most common: 0/N nodes are available: insufficient memory or node(s) didn't match Pod's node affinity.
If insufficient resources: confirm Karpenter is provisioning (kubectl get nodeclaims); check NodePool requirements allow the pod’s architecture and instance family.
If affinity mismatch: check NodePool labels match pod’s nodeSelector / affinity.
If FailedScheduling on Pod Identity SA: confirm PodIdentityAssociation exists for (cluster, namespace, serviceAccount).

Debug path: “Node not ready”:

kubectl describe node <node> → Conditions section. MemoryPressure, DiskPressure, PIDPressure are first signals.
CloudWatch Container Insights → node detail → kubelet logs.
VPC CNI: kubectl logs -n kube-system -l k8s-app=aws-node for IP exhaustion or ENI attach failures.
If on Auto Mode, the node will be replaced automatically — confirm replacement is in progress before manual intervention.

When EKS is NOT the right call

Small, simple container workload with 1–3 services and a team unfamiliar with Kubernetes — Amazon ECS on Fargate has a fraction of the operational surface and is often the better first step.
Entirely event-driven or short-lived workload — AWS Lambda or ECS Fargate spot often costs less and simplifies ops.
You have no plans to leverage Kubernetes portability or ecosystem — the $73/month per-cluster plus learning-curve tax is real.
You need air-gapped operation with no AWS dependency — evaluate EKS Anywhere or upstream Kubernetes on bare metal.

EKS best practices

Resource management

Always set requests and limits. Use Vertical Pod Autoscaler recommendations to size requests.
Use pod disruption budgets; align with Karpenter disruption budgets to avoid correlated voluntary disruptions.

Auto-scaling

Karpenter (or Auto Mode’s managed Karpenter) preferred over Cluster Autoscaler for new clusters.
HPA on CPU/memory/custom metrics (Prometheus) for pod-level scaling.
KEDA for event-driven autoscaling (SQS, Kinesis, Kafka lag).

Security

Pod Identity for pod-level IAM.
Network policies via Cilium/Calico; restrict egress by default.
Kubernetes secrets encrypted with a customer-managed KMS key.
Pair with HashiCorp Vault Secrets Operator or AWS Secrets Manager + Secrets Store CSI driver for application secrets.
ECR enhanced scanning + image signing verified at admission.

Reliability

Multi-AZ NodePools; never pin a production NodePool to a single AZ.
Backups of cluster state (Velero) for stateful apps or CRD-heavy control-plane configuration.
Routine disaster-recovery tests of cluster re-creation from IaC.

$0.10/hr

EKS control plane list price (per cluster)

30-40%

Typical cost savings moving x86 node groups to Graviton3/4

1.32

Target Kubernetes minor version on EKS in 2026

Tools & Calculators

Self-serve calculators and assessments that pair with this integration.

AWS Architecture Review

Have an AWS-certified architect review your EKS cluster design, networking, and cost posture.

Open Tool

Related AWS Services

Consulting engagements that frequently pair with this integration.

AWS Application Modernization Services

AWS application modernization solutions — legacy apps to microservices, containers, and serverless. Free portfolio assessment from an AWS Select Tier Partner.

Explore Service

AWS DevOps Consulting

AWS DevOps consulting — CI/CD pipeline setup, infrastructure as code (SAM/CDK), and deployment automation.

Explore Service

Hire a Dedicated AWS Consultant | FactualMinds

Hire a dedicated AWS consultant — a certified expert embedded with your team for cloud management, cost optimization, security, and architecture work.

Explore Service

Who typically runs this integration?

The roles that most often own or review this stack.

AWS Solutions for DevOps & Platform Engineers

EKS Auto Mode, OIDC-native CI/CD, supply-chain security, CDK Toolkit v2, and eBPF observability for platform teams building the platform on AWS in 2026.

Explore

AWS Solutions for CTOs

Cloud strategy, multi-account governance, agentic AI platform decisions, and FinOps culture for technology leaders scaling AWS in 2026 and beyond.

Explore

Related Integrations

Other AWS integration guides commonly deployed alongside this one.

Terraform on AWS

Terraform + AWS in 2026: Stacks GA, ephemeral values, provider-defined functions, Test Framework, OpenTofu 1.8 encryption — vs CDK and CloudFormation.

View Guide

Datadog with AWS

Datadog on AWS in 2026: unified observability for CloudWatch, EKS, Lambda, Bedrock LLM workloads, and security posture across multi-cloud estates.

View Guide

HashiCorp Vault on AWS

HashiCorp Vault on AWS: dynamic DB credentials, transit-engine encryption, HCP Vault Secrets, and EKS Secrets Operator vs AWS Secrets Manager guidance.

View Guide

Frequently Asked Questions

What is EKS Auto Mode and when should I use it?

EKS Auto Mode (GA November 2024) is a fully managed EKS tier where AWS operates the node pool, networking add-ons, load balancing, and storage controllers for you. It uses a managed Karpenter for fast, cost-aware scaling and patches nodes automatically via an ephemeral-node model (replace, not patch in place). Use Auto Mode when your team wants Kubernetes without node operations; keep managed node groups when you need very specific AMI control, custom kernel modules, or a regulated baseline AMI required by your security team.

How does EKS Pod Identity differ from IRSA, and which should I use in 2026?

Pod Identity (GA 2023, matured through 2025) is simpler and strictly better for most new clusters. IRSA required you to (a) create an IAM OIDC identity provider per cluster, (b) write a trust policy with StringEquals on the cluster OIDC issuer and ServiceAccount name, and (c) annotate the ServiceAccount with the role ARN. Pod Identity replaces all of that with a single create-pod-identity-association call and a Pod Identity Agent that runs on each node. IRSA is still required for (1) EC2 workloads outside EKS, (2) clusters running Kubernetes <1.24, and (3) the handful of controllers that only accept token-file auth.

What is Karpenter 1.0 and how does it change node scaling?

Karpenter 1.0 (GA 2024) stabilised the NodeClass/NodePool CRDs and added disruption budgets, consolidation policies, and a proper upgrade path for in-place updates. Compared with Cluster Autoscaler: Karpenter picks the cheapest instance type that fits pending pods (across on-demand, Spot, AMD, Graviton, and various sizes), schedules pods on fresh nodes in under a minute typically, and consolidates underutilised nodes automatically. On EKS Auto Mode, Karpenter is the built-in scheduler — you do not manage it directly.

When should I use EKS Hybrid Nodes versus EKS Anywhere?

EKS Hybrid Nodes (GA November 2024) lets you register on-prem or edge Linux hosts as worker nodes to an EKS control plane running in AWS. The control plane stays in AWS; the workers run anywhere. Use Hybrid Nodes when (a) you want one Kubernetes control plane governing cloud and on-prem workloads, (b) data gravity or latency forces compute close to data, or (c) your on-prem workloads are small enough that running a full EKS Anywhere cluster on-site is overkill. Use EKS Anywhere when on-prem needs full isolation — its own control plane, air-gapped operation, or no dependency on AWS connectivity. For most mid-market hybrid customers in 2026 we default to EKS Hybrid Nodes.

How do I secure container images pulled to EKS?

Three layers. (1) Push only to ECR with enhanced scanning enabled — Amazon Inspector v2 scans the image and its OS and language-package dependencies for CVEs and exploit-probability-index findings. (2) Enforce image signing with Amazon ECR container image signing (AWS Signer), verified on the cluster via a policy engine like Kyverno or Gatekeeper. (3) Pair with Artifact Attestations from your CI (GitHub Actions) so the deploy step verifies SLSA-aligned provenance before calling kubectl apply. For regulated workloads, enable AWS PrivateLink endpoints for ECR so image pulls never transit the public internet.

What is the 2026 best practice for logging and observability on EKS?

The default we deploy: CloudWatch Container Insights enhanced observability for AWS-service-native metrics and cluster control-plane metrics, plus AWS Distro for OpenTelemetry (ADOT) Collector as a DaemonSet forwarding application traces and custom metrics to either Amazon Managed Grafana + Managed Prometheus (AWS-native) or Datadog / New Relic / Honeycomb (third party). EKS audit logs go to a CloudWatch log group with 90-day minimum retention and an S3 archive behind Object Lock for compliance. For Bedrock-heavy workloads, layer Datadog LLM Observability on top.

How does Graviton affect EKS cost and what are the gotchas?

Graviton3 and the newer Graviton4 (m8g/c8g/r8g families) deliver 30-40% better price-performance than comparable x86 for most stateless workloads and essentially all typical microservices. The main gotchas: (1) your container images must be multi-arch (linux/amd64 + linux/arm64) — use Docker buildx in CI; (2) some proprietary sidecars (older versions of some APM agents) still lack ARM support; (3) JVM workloads need a JDK with ARM64 support (every modern LTS has it). Karpenter on Auto Mode will bin-pack across AMD and Graviton automatically if both architectures are allowed in the NodePool.

Need Help with This Integration?

Our AWS-certified engineers can design, implement, and operate this integration end-to-end — or review what you already have.

Talk to AWS Experts

AWS Architecture Review

Kubernetes on AWS (EKS)

Amazon EKS overview

What’s new on EKS in 2026

Why EKS

EKS Architecture

EKS Auto Mode in practice

EKS Hybrid Nodes

Pod Identity vs IRSA

Karpenter 1.0 patterns we deploy

Observability stack

Graviton cost savings

Reference architecture (2026 default)

Failure modes & resilience

Observability runbook

When EKS is NOT the right call

EKS best practices

Tools & Calculators

AWS Architecture Review

Related AWS Services

AWS Application Modernization Services

AWS DevOps Consulting

Hire a Dedicated AWS Consultant | FactualMinds

Who typically runs this integration?

AWS Solutions for DevOps & Platform Engineers

AWS Solutions for CTOs

Related Integrations

Terraform on AWS

Datadog with AWS

HashiCorp Vault on AWS

Frequently Asked Questions

Related Reading

AWS ECS vs EKS: Container Orchestration Decision Guide

Karpenter vs Cluster Autoscaler: EKS Node Cost Optimization in 2026

How to Deploy EKS with Karpenter for Cost-Optimized Autoscaling

Need Help with This Integration?

Kubernetes on AWS (EKS)

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Amazon EKS overview

What’s new on EKS in 2026

Why EKS

EKS Architecture

EKS Auto Mode in practice

EKS Hybrid Nodes

Pod Identity vs IRSA

Karpenter 1.0 patterns we deploy

Observability stack

Graviton cost savings

Reference architecture (2026 default)

Failure modes & resilience

Observability runbook

When EKS is NOT the right call

EKS best practices

Related reading

Related services

Tools & Calculators

AWS Architecture Review

Related AWS Services

AWS Application Modernization Services

AWS DevOps Consulting

Hire a Dedicated AWS Consultant | FactualMinds

Who typically runs this integration?

AWS Solutions for DevOps & Platform Engineers

AWS Solutions for CTOs

Related Integrations

Terraform on AWS

Datadog with AWS

HashiCorp Vault on AWS

Frequently Asked Questions

Related Reading

AWS ECS vs EKS: Container Orchestration Decision Guide

Karpenter vs Cluster Autoscaler: EKS Node Cost Optimization in 2026

How to Deploy EKS with Karpenter for Cost-Optimized Autoscaling

Need Help with This Integration?