---
title: Kubernetes on AWS (EKS)
description: Amazon EKS in 2026: Auto Mode GA, Hybrid Nodes, Karpenter 1.0, Pod Identity, Graviton-first node pools, and ECR enhanced scanning — cheaper, safer K8s.
url: https://www.factualminds.com/integrations/kubernetes-aws-eks/
category: container
updated: 2026-04-29
---

# Kubernetes on AWS (EKS)

> Managed Kubernetes on AWS with Auto Mode, Hybrid Nodes, Karpenter 1.0, and Graviton-first node pools.

## Amazon EKS overview

Amazon EKS is AWS-managed Kubernetes. The control plane (API server, scheduler, etcd) is operated by AWS, patched automatically, and deployed across at least three availability zones. You own the data plane — or, on **EKS Auto Mode** (GA November 2024), you delegate the data plane to AWS as well and consume Kubernetes as an almost-serverless service.

FactualMinds deploys EKS for teams that need Kubernetes portability (multi-cloud, on-prem via **EKS Hybrid Nodes**, or open-source ecosystem alignment) and for mid-market AWS-only teams that have outgrown ECS or plain Fargate. We default new 2026 clusters to **Auto Mode on Kubernetes 1.32 with Graviton-first node pools** unless a specific workload says otherwise.

## What's new on EKS in 2026

- **EKS Auto Mode GA** — fully managed data plane, managed add-ons (VPC CNI, kube-proxy, CoreDNS, EBS CSI, AWS Load Balancer Controller), and a managed Karpenter that provisions nodes within seconds of scheduling pressure.
- **EKS Hybrid Nodes** (GA November 2024) — register Linux hosts running on-prem or at the edge as EKS worker nodes governed by an AWS-hosted control plane. One `kubectl` surface for cloud and hybrid.
- **Karpenter 1.0** (2024) — stable NodeClass/NodePool CRDs, disruption budgets, and consolidation-policy modes. Karpenter is the default on Auto Mode.
- **Pod Identity** — the ergonomic replacement for IRSA. No OIDC provider, no ServiceAccount annotation, no trust-policy gymnastics.
- **Kubernetes 1.31 / 1.32** — typical supported minor versions on EKS in 2026; upstream releases every ~4 months, EKS supports the current plus the previous three.
- **ECR enhanced scanning** — Inspector v2 scans images for OS and language-package CVEs with exploit-probability-index scoring; integrates with Security Hub.
- **AWS Load Balancer Controller** — managed install on Auto Mode; supports Gateway API, ALB and NLB target-group binding, and cross-zone health checks.
- **Amazon EBS CSI driver** managed add-on — Auto Mode handles install and upgrade; gp3 volumes by default.
- **Cilium + Hubble / eBPF observability** — supported via add-ons for teams that need deep network visibility without full-fat service mesh.

## Why EKS

**Kubernetes standard**

- Standard `kubectl`, Helm, Kustomize, and standard manifests.
- Portable: workloads run on other clouds, on-premises (EKS Hybrid Nodes or EKS Anywhere), or upstream Kubernetes.
- Massive ecosystem (Prometheus, OpenTelemetry, Argo CD, Flux, Karpenter, Cilium, Istio, Linkerd).

**AWS integration**

- VPC CNI for pod networking with real AWS IP addresses.
- Pod Identity for pod-level IAM permissions without OIDC acrobatics.
- Native integrations with ALB/NLB, EFS, EBS, S3, RDS, DynamoDB, SQS, Kinesis, Bedrock.
- AWS Security Hub / GuardDuty EKS Protection for runtime threat detection.

**Managed control plane**

- Multi-AZ control plane included in the $0.10/hour price.
- AWS patches the control plane on a published minor-version cadence.
- SLA covers control-plane availability; you are responsible for workload availability.

## EKS Architecture

**Control plane** (AWS managed)

- API server, scheduler, controller managers, etcd.
- Audit logs can be shipped to CloudWatch Logs; control-plane endpoints can be private, public, or public+private with IP allow-list.

**Data plane** (your choice)

- **EKS Auto Mode** — fully managed nodes, add-ons, networking, load balancing, and storage controllers.
- **Managed node groups** — EC2 instances you provision; AWS manages OS patching, drain, and replacement.
- **Karpenter on self-managed nodes** — for teams that want fine control over instance-type selection and disruption policy.
- **AWS Fargate** — serverless pods with no node management; higher per-pod price, best for bursty or sandbox workloads.

**Networking**

- AWS VPC CNI: each pod gets a real VPC IP (prefix delegation supported for IP density).
- Security groups for pods (SGFP) for per-pod network security.
- Cilium eBPF or Calico for network policy and observability.
- AWS Load Balancer Controller for ALB/NLB ingress.

## EKS Auto Mode in practice

- AWS provisions, scales, and replaces nodes automatically based on pending pods.
- Managed Karpenter bin-packs across instance types, including Graviton by default.
- OS patching via node replacement on a rolling schedule; no in-place kernel updates.
- AWS manages the core add-ons (VPC CNI, kube-proxy, CoreDNS, AWS Load Balancer Controller, EBS CSI).
- Billed as EC2 + a small EKS Auto Mode management fee per vCPU-hour; typically net-neutral or cheaper versus self-managed node groups when labor is priced in.

**Use Auto Mode when**

- You want Kubernetes without node operations.
- Your security team can live with AWS-managed, regularly replaced AMIs.
- Your workloads do not require custom kernel modules or niche runtime options.

**Prefer managed node groups when**

- You need a regulated/approved AMI (e.g., STIG-hardened) maintained by your security team.
- You run custom kernel modules (BPF/eBPF extensions beyond what's supported, niche drivers).
- You want fine-grained Spot pool control that the managed NodePool does not expose.

## EKS Hybrid Nodes

- Register on-prem Linux hosts as EKS workers against an AWS-hosted control plane.
- Supports x86 and ARM; requires AWS Systems Manager connectivity from the on-prem host.
- Use for edge compute, data-gravity on-prem workloads, or manufacturing floor nodes that must stay physically on site but should be governed from AWS.
- Compare to EKS Anywhere: Hybrid Nodes share one control plane with AWS; EKS Anywhere runs its own on-site control plane.

## Pod Identity vs IRSA

**Pod Identity (2026 default)**

```bash
aws eks create-pod-identity-association \
  --cluster-name my-cluster \
  --namespace production \
  --service-account my-app \
  --role-arn arn:aws:iam::123456789:role/my-app-role
```

Pods using the `my-app` ServiceAccount in the `production` namespace automatically receive temporary credentials via the Pod Identity Agent. No annotation, no OIDC provider, no trust-policy StringEquals dance.

**IRSA (legacy / niche)**

- Still required for workloads that only accept token-file authentication, clusters older than 1.24, or EC2 workloads outside EKS.
- OIDC provider + annotated ServiceAccount + trust-policy condition on the OIDC subject.

## Karpenter 1.0 patterns we deploy

- **Graviton-first NodePool** — allow `arm64` architectures, prefer on-demand for baseline and Spot for scale-out.
- **Consolidation policy** — `WhenUnderutilized` for dev/staging, `WhenEmpty` for production to avoid disruption of long-running pods.
- **Disruption budgets** — cap how many nodes Karpenter can consolidate per hour, aligned with PDB.
- **Per-namespace NodePool selection** — heavy GPU workloads go to a dedicated NodePool with `nvidia.com/gpu` taints.

## Observability stack

- **CloudWatch Container Insights (enhanced observability)** for the cluster, nodes, pods, and control plane.
- **ADOT Collector DaemonSet** forwarding traces, metrics, and logs to Managed Prometheus + Managed Grafana, or to Datadog / New Relic / Honeycomb.
- **Cilium Hubble** (or Pixie) for eBPF-level network visibility without a service mesh.
- **EKS audit logs** to CloudWatch Logs with 90-day retention and S3 archive behind Object Lock for SOC 2 / PCI evidence.

## Graviton cost savings

- Graviton3 (`m7g`, `c7g`, `r7g`) and Graviton4 (`m8g`, `c8g`, `r8g`) typically deliver 30–40% better price-performance than comparable x86 for stateless microservices and JVM workloads.
- Build multi-arch images with `docker buildx build --platform linux/amd64,linux/arm64` in CI; push both manifests to ECR.
- Karpenter on Auto Mode will pick ARM when it wins on price and pod fits.

## Reference architecture (2026 default)

```
                    ┌──────────────────────────────────────────────┐
                    │  AWS-managed control plane (multi-AZ)        │
                    │  api / scheduler / controller-mgr / etcd     │
                    │  audit + authenticator + scheduler logs      │
                    └─────────────────┬────────────────────────────┘
                                      │ (private endpoint via PrivateLink)
                                      │
   ┌──────────────────────────────────┼──────────────────────────────────┐
   │ Data plane (Auto Mode)           │                                  │
   │  ├── managed Karpenter NodePool  │ ── Pod Identity Agent (per node) │
   │  ├── Graviton-first c8g/m8g/r8g  │ ── VPC CNI (prefix delegation)   │
   │  ├── consolidation policy        │ ── EBS CSI (gp3 default)         │
   │  └── disruption budgets          │ ── AWS LB Controller (ALB+NLB)   │
   └──────────────────────────────────┴──────────────────────────────────┘
                                      │
   Workloads ── ServiceAccount → PodIdentityAssociation → IAM Role
   Ingress  ── ALB (alb.ingress.k8s.aws/scheme: internet-facing)
   Storage  ── EBS gp3 PVCs / EFS for shared / S3 for objects
   Secrets  ── Secrets Store CSI / HashiCorp VSO → Vault / Secrets Manager
   Images   ── ECR (enhanced scanning, image signing) ← CI attestation
   Telemetry ─ CloudWatch Container Insights + ADOT → Datadog / AMP+AMG
   Audit    ── CloudWatch Logs (90d) + S3 Object Lock (compliance archive)
```

## Failure modes & resilience

**1. Karpenter consolidation evicting under-budgeted pods.** Default `consolidationPolicy: WhenUnderutilized` will move pods aggressively. For long-running stateful workloads, set `WhenEmpty` on the NodePool and define a PodDisruptionBudget (`minAvailable`) so consolidation cannot violate availability. Disruption budgets at the NodePool level cap voluntary disruptions per hour.

**2. Pod Identity Agent crash-loop.** Symptom: pods using the ServiceAccount get `403 AccessDenied` from STS. Causes: agent DaemonSet pod CrashLoopBackOff (check `kubectl logs -n kube-system -l app=eks-pod-identity-agent`), Pod Identity Association pointing at a non-existent IAM role, trust policy missing `pods.eks.amazonaws.com` principal, or IMDS hop limit too low on the node. Auto Mode handles the agent; on managed node groups confirm the agent add-on is healthy.

**3. NodePool pinned to a single AZ.** A zonal disruption (control-plane outage in one AZ, ELB endpoint flap) takes the workload with it. Always include `topology.kubernetes.io/zone In [a, b, c]` in NodePool requirements; combine with `topologySpreadConstraints` on Deployments.

**4. gp3 volume detach during node replacement.** Auto Mode replaces nodes — StatefulSets with `volumeClaimTemplates` should explicitly set `persistentVolumeReclaimPolicy: Retain` and a `storageClass` with `volumeBindingMode: WaitForFirstConsumer`. Otherwise an in-flight reschedule can race with detach and the pod stays `ContainerCreating` for several minutes.

**5. `--max-unavailable` vs PDB collisions.** A Deployment's RollingUpdate strategy plus a strict PDB (`minAvailable: 100%`) deadlocks the rollout. Always set PDB `minAvailable` such that `replicas - minAvailable >= maxUnavailable`.

**6. Cluster Autoscaler vs Karpenter coexistence.** Running both in the same cluster causes thrash. Pick one. Karpenter for new clusters; Cluster Autoscaler only if a vendor product hard-requires it.

**7. EKS minor-version upgrade window.** AWS supports current + 3 prior minors (~14 months). Letting a cluster slip to N-4 forces emergency upgrade across multiple breaking changes. Schedule quarterly minor upgrades; test in a staging cluster first.

## Observability runbook

**Enable control-plane logs at cluster creation:**

```bash
aws eks update-cluster-config \
  --region eu-west-1 \
  --name my-cluster \
  --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
```

**Alarms we ship:**

| Alarm                                                       | First action                                                                                   |
| ----------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| `cluster_failed_request_count > 0` (control plane)          | Check audit logs for `Forbidden` / `Unauthorized` patterns; review IAM Identity mappings       |
| `node_status_condition` Ready=false on any node             | `kubectl describe node`; check kubelet, CNI, and SSM agent health                              |
| Karpenter `nodeclaim_disruption_total` spike                | Inspect NodePool consolidation events; verify PDBs are honored                                 |
| `pod_pending_count > 0` for `> 5 min`                       | `kubectl describe pod` → events; NodePool requirements vs pod tolerations / arch mismatch      |
| ECR image-pull error rate                                   | VPC endpoint health for `com.amazonaws.<region>.ecr.dkr`; IAM role `ecr:GetAuthorizationToken` |
| ADOT Collector `otelcol_exporter_send_failed_metric_points` | Backend (AMP / Datadog) reachability; collector resource limits                                |

**Debug path: "Pod stuck Pending":**

1. `kubectl describe pod <name>` → Events. Most common: `0/N nodes are available: insufficient memory` or `node(s) didn't match Pod's node affinity`.
2. If insufficient resources: confirm Karpenter is provisioning (`kubectl get nodeclaims`); check NodePool `requirements` allow the pod's architecture and instance family.
3. If affinity mismatch: check NodePool labels match pod's `nodeSelector` / `affinity`.
4. If `FailedScheduling` on Pod Identity SA: confirm `PodIdentityAssociation` exists for `(cluster, namespace, serviceAccount)`.

**Debug path: "Node not ready":**

1. `kubectl describe node <node>` → Conditions section. `MemoryPressure`, `DiskPressure`, `PIDPressure` are first signals.
2. CloudWatch Container Insights → node detail → kubelet logs.
3. VPC CNI: `kubectl logs -n kube-system -l k8s-app=aws-node` for IP exhaustion or ENI attach failures.
4. If on Auto Mode, the node will be replaced automatically — confirm replacement is in progress before manual intervention.

## When EKS is NOT the right call

- Small, simple container workload with 1–3 services and a team unfamiliar with Kubernetes — **Amazon ECS on Fargate** has a fraction of the operational surface and is often the better first step.
- Entirely event-driven or short-lived workload — **AWS Lambda** or ECS Fargate spot often costs less and simplifies ops.
- You have no plans to leverage Kubernetes portability or ecosystem — the $73/month per-cluster plus learning-curve tax is real.
- You need air-gapped operation with no AWS dependency — evaluate **EKS Anywhere** or upstream Kubernetes on bare metal.

## EKS best practices

**Resource management**

- Always set `requests` and `limits`. Use Vertical Pod Autoscaler recommendations to size requests.
- Use pod disruption budgets; align with Karpenter disruption budgets to avoid correlated voluntary disruptions.

**Auto-scaling**

- Karpenter (or Auto Mode's managed Karpenter) preferred over Cluster Autoscaler for new clusters.
- HPA on CPU/memory/custom metrics (Prometheus) for pod-level scaling.
- KEDA for event-driven autoscaling (SQS, Kinesis, Kafka lag).

**Security**

- Pod Identity for pod-level IAM.
- Network policies via Cilium/Calico; restrict egress by default.
- Kubernetes secrets encrypted with a customer-managed KMS key.
- Pair with **HashiCorp Vault Secrets Operator** or AWS Secrets Manager + Secrets Store CSI driver for application secrets.
- ECR enhanced scanning + image signing verified at admission.

**Reliability**

- Multi-AZ NodePools; never pin a production NodePool to a single AZ.
- Backups of cluster state (Velero) for stateful apps or CRD-heavy control-plane configuration.
- Routine disaster-recovery tests of cluster re-creation from IaC.

## Related reading

- [`ECS vs EKS: container orchestration decision guide`](/blog/aws-ecs-vs-eks-container-orchestration-decision-guide/)
- [`Karpenter vs Cluster Autoscaler on EKS: cost optimization`](/blog/karpenter-vs-cluster-autoscaler-eks-cost-optimization/)
- [`How to deploy EKS with Karpenter for cost-optimized autoscaling`](/blog/how-to-deploy-eks-karpenter-cost-optimized-autoscaling/)

## Related services

- [AWS Application Modernization](/services/aws-application-modernization/)
- [DevOps Pipeline Setup](/services/devops-pipeline-setup/)
- [Hire a Dedicated AWS Expert](/services/hire-a-dedicated-aws-expert/)

## Create an EKS cluster with Auto Mode and deploy your first workload

Provision a production-ready EKS cluster with Auto Mode, configure Pod Identity, and deploy a workload.

1. **Create the cluster with Auto Mode enabled** — Use eksctl, Terraform, or the AWS console to create a cluster with Kubernetes 1.32 and compute config enabled for Auto Mode. Auto Mode provisions a managed Karpenter-driven node pool by default.
2. **Configure VPC CNI, kube-proxy, and CoreDNS as managed add-ons** — Enable the EKS managed add-ons. On Auto Mode clusters, the VPC CNI, kube-proxy, and CoreDNS add-ons are installed and upgraded by AWS automatically.
3. **Create an IAM role and Pod Identity Association for your workload** — Create an IAM role with only the AWS permissions your pod needs, then call aws eks create-pod-identity-association with the cluster, namespace, service account name, and role ARN. No OIDC provider, no ServiceAccount annotation — the association is the trust boundary.
4. **Deploy the workload with kubectl or Argo CD** — Push a Deployment and Service manifest referencing the ServiceAccount you associated. For HTTPS, use the AWS Load Balancer Controller with an Ingress of class alb.
5. **Wire observability and security** — Enable CloudWatch Container Insights (or install Datadog Agent/Prometheus), turn on ECR enhanced scanning with Inspector v2 for vulnerability findings, and set up EKS audit logging to CloudWatch with a 90-day retention minimum.

## FAQ

### What is EKS Auto Mode and when should I use it?
EKS Auto Mode (GA November 2024) is a fully managed EKS tier where AWS operates the node pool, networking add-ons, load balancing, and storage controllers for you. It uses a managed Karpenter for fast, cost-aware scaling and patches nodes automatically via an ephemeral-node model (replace, not patch in place). Use Auto Mode when your team wants Kubernetes without node operations; keep managed node groups when you need very specific AMI control, custom kernel modules, or a regulated baseline AMI required by your security team.

### How does EKS Pod Identity differ from IRSA, and which should I use in 2026?
Pod Identity (GA 2023, matured through 2025) is simpler and strictly better for most new clusters. IRSA required you to (a) create an IAM OIDC identity provider per cluster, (b) write a trust policy with StringEquals on the cluster OIDC issuer and ServiceAccount name, and (c) annotate the ServiceAccount with the role ARN. Pod Identity replaces all of that with a single create-pod-identity-association call and a Pod Identity Agent that runs on each node. IRSA is still required for (1) EC2 workloads outside EKS, (2) clusters running Kubernetes <1.24, and (3) the handful of controllers that only accept token-file auth.

### What is Karpenter 1.0 and how does it change node scaling?
Karpenter 1.0 (GA 2024) stabilised the NodeClass/NodePool CRDs and added disruption budgets, consolidation policies, and a proper upgrade path for in-place updates. Compared with Cluster Autoscaler: Karpenter picks the cheapest instance type that fits pending pods (across on-demand, Spot, AMD, Graviton, and various sizes), schedules pods on fresh nodes in under a minute typically, and consolidates underutilised nodes automatically. On EKS Auto Mode, Karpenter is the built-in scheduler — you do not manage it directly.

### When should I use EKS Hybrid Nodes versus EKS Anywhere?
EKS Hybrid Nodes (GA November 2024) lets you register on-prem or edge Linux hosts as worker nodes to an EKS control plane running in AWS. The control plane stays in AWS; the workers run anywhere. Use Hybrid Nodes when (a) you want one Kubernetes control plane governing cloud and on-prem workloads, (b) data gravity or latency forces compute close to data, or (c) your on-prem workloads are small enough that running a full EKS Anywhere cluster on-site is overkill. Use EKS Anywhere when on-prem needs full isolation — its own control plane, air-gapped operation, or no dependency on AWS connectivity. For most mid-market hybrid customers in 2026 we default to EKS Hybrid Nodes.

### How do I secure container images pulled to EKS?
Three layers. (1) Push only to ECR with enhanced scanning enabled — Amazon Inspector v2 scans the image and its OS and language-package dependencies for CVEs and exploit-probability-index findings. (2) Enforce image signing with Amazon ECR container image signing (AWS Signer), verified on the cluster via a policy engine like Kyverno or Gatekeeper. (3) Pair with Artifact Attestations from your CI (GitHub Actions) so the deploy step verifies SLSA-aligned provenance before calling kubectl apply. For regulated workloads, enable AWS PrivateLink endpoints for ECR so image pulls never transit the public internet.

### What is the 2026 best practice for logging and observability on EKS?
The default we deploy: CloudWatch Container Insights enhanced observability for AWS-service-native metrics and cluster control-plane metrics, plus AWS Distro for OpenTelemetry (ADOT) Collector as a DaemonSet forwarding application traces and custom metrics to either Amazon Managed Grafana + Managed Prometheus (AWS-native) or Datadog / New Relic / Honeycomb (third party). EKS audit logs go to a CloudWatch log group with 90-day minimum retention and an S3 archive behind Object Lock for compliance. For Bedrock-heavy workloads, layer Datadog LLM Observability on top.

### How does Graviton affect EKS cost and what are the gotchas?
Graviton3 and the newer Graviton4 (m8g/c8g/r8g families) deliver 30-40% better price-performance than comparable x86 for most stateless workloads and essentially all typical microservices. The main gotchas: (1) your container images must be multi-arch (linux/amd64 + linux/arm64) — use Docker buildx in CI; (2) some proprietary sidecars (older versions of some APM agents) still lack ARM support; (3) JVM workloads need a JDK with ARM64 support (every modern LTS has it). Karpenter on Auto Mode will bin-pack across AMD and Graviton automatically if both architectures are allowed in the NodePool.

---

*Source: https://www.factualminds.com/integrations/kubernetes-aws-eks/*
