# Managed EKS decision checklist

Evaluate **self-managed Karpenter**, **EKS Auto Mode**, and **partner-managed clusters**
before signing an MSP SOW or enabling Auto Mode in production.

> Reflects **July 2026**: EKS Auto Mode includes managed Karpenter, Node Auto Repair
> (10-minute GPU failure detection), Bottlerocket AMIs, and consolidation-based cost
> optimization per AWS containers blog (2025–2026).

## Stage 0 — Scope what you are buying

- [ ] List ops tasks: control plane upgrades, node AMI patches, Karpenter tuning, GPU drivers, on-call
- [ ] Document compliance needs (FIPS, custom CNI, restricted instance types)
- [ ] Fill [eks-mode-cost-worksheet.csv](./eks-mode-cost-worksheet.csv) with your node counts

**Rollback trigger:** MSP SOW says "managed Kubernetes" but excludes upgrades → renegotiate scope.

## Stage 1 — Self-managed Karpenter fit

- [ ] Team has ≥1 K8s SRE with Karpenter production experience
- [ ] Custom instance families or bare metal required (Auto Mode has predefined set)
- [ ] GitOps (Argo CD / Flux) already standardized
- [ ] Willing to own Node Monitoring Agent and GPU telemetry

**Choose self-managed when:** You need full Karpenter NodePool flexibility and custom AMIs.

## Stage 2 — EKS Auto Mode fit

- [ ] Workloads fit predefined instance types (including GPU accelerated)
- [ ] Want AWS to run consolidation and node lifecycle
- [ ] Accept AWS-managed Bottlerocket AMIs (no custom kernel modules)
- [ ] Pod Disruption Budgets defined for Node Auto Repair cordon/replace

**Choose Auto Mode when:** Ops headcount is the bottleneck, not architecture novelty.

## Stage 3 — Partner-managed fit

- [ ] Need 24/7 paging and change windows you cannot staff
- [ ] Multi-cluster fleet (>5 clusters) without internal platform team
- [ ] Regulatory requirement for external operator separation of duties
- [ ] MSP provides runbooks tied to your application SLAs (not generic K8s)

**Choose partner when:** Auto Mode still leaves app-level on-call gaps you cannot fill.

## Stage 4 — Migration gates

- [ ] Run parallel node pool (10% traffic) for 2 weeks before cutover
- [ ] Validate GPU workloads: inference latency before/after Auto Mode AMI
- [ ] Document rollback: previous node group or cluster autoscaler config snapshot

## Related posts

- [Managed EKS Auto Mode vs Karpenter decision](/blog/aws-managed-eks-auto-mode-vs-karpenter-decision-2026/)
- [Deploy EKS with Karpenter](/blog/how-to-deploy-eks-karpenter-cost-optimized-autoscaling/)
- [Karpenter vs Cluster Autoscaler](/blog/karpenter-vs-cluster-autoscaler-eks-cost-optimization/)
- [EKS pricing and Auto Mode](/blog/amazon-eks-pricing-control-plane-addons-auto-mode/)
