Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

On June 11, 2026, most AWS platform teams do not have a maturity problem — they have a measurement problem. Leadership asks for "DevOps maturity" and gets a CMMI worksheet or a DORA dashboard with deploy frequency and lead time, but no answer to the operational question: what do we build next qua...

Key Facts

  • Generic DevOps maturity models score you on culture slides — this one maps L1–L4 to AWS gates you can verify: IaC in Git, GitOps or gated CD, ADOT on EKS, FIS with stop conditions, and cost-aware CI
  • A composite 85-engineer SaaS moved from L2 to L3 in one quarter by fixing the CI/GitOps boundary alone, cutting deploy-related incidents from ~6/month to 2
  • On June 11, 2026, most AWS platform teams do not have a maturity problem — they have a measurement problem
  • This post is a four-level, AWS-anchored maturity model for platform and DevOps programs
  • It is not a replacement for 10 AWS DevOps practices we use in production — that post is what to do

Entity Definitions

EC2
EC2 is an AWS service discussed in this article.
DynamoDB
DynamoDB is an AWS service discussed in this article.
CloudWatch
CloudWatch is an AWS service discussed in this article.
VPC
VPC is an AWS service discussed in this article.
EKS
EKS is an AWS service discussed in this article.
CodePipeline
CodePipeline is an AWS service discussed in this article.
CI/CD
CI/CD is a cloud computing concept discussed in this article.
DevOps
DevOps is a cloud computing concept discussed in this article.

AWS DevOps & Platform Maturity Model (2026): A 4-Level Scorecard Anchored to Real Services

DevOps & CI/CD Palaniappan P 5 min read

Quick summary: Generic DevOps maturity models score you on culture slides — this one maps L1–L4 to AWS gates you can verify: IaC in Git, GitOps or gated CD, ADOT on EKS, FIS with stop conditions, and cost-aware CI. A composite 85-engineer SaaS moved from L2 to L3 in one quarter by fixing the CI/GitOps boundary alone, cutting deploy-related incidents from ~6/month to 2.

Key Takeaways

  • Generic DevOps maturity models score you on culture slides — this one maps L1–L4 to AWS gates you can verify: IaC in Git, GitOps or gated CD, ADOT on EKS, FIS with stop conditions, and cost-aware CI
  • A composite 85-engineer SaaS moved from L2 to L3 in one quarter by fixing the CI/GitOps boundary alone, cutting deploy-related incidents from ~6/month to 2
  • On June 11, 2026, most AWS platform teams do not have a maturity problem — they have a measurement problem
  • This post is a four-level, AWS-anchored maturity model for platform and DevOps programs
  • It is not a replacement for 10 AWS DevOps practices we use in production — that post is what to do
AWS DevOps & Platform Maturity Model (2026): A 4-Level Scorecard Anchored to Real Services
Table of Contents

On June 11, 2026, most AWS platform teams do not have a maturity problem — they have a measurement problem. Leadership asks for “DevOps maturity” and gets a CMMI worksheet or a DORA dashboard with deploy frequency and lead time, but no answer to the operational question: what do we build next quarter, and how do we know it worked? AWS has shipped concrete platform primitives since re:Invent 2024declarative policies for durable EC2/VPC/EBS baselines, Resource Control Policies through February 2026 (including DynamoDB), ADOT as an EKS add-on, and FIS scenarios integrated with AWS Resilience Hub — but those land as feature announcements, not as levels on a scorecard.

This post is a four-level, AWS-anchored maturity model for platform and DevOps programs. It is not a replacement for 10 AWS DevOps practices we use in production — that post is what to do. This one is where you are and what to do next, with a downloadable scorecard and 90-day upgrade template.

Benchmark pattern (not a cited client) — Composite B2B SaaS, ~85 engineers, 4 AWS accounts (no OU guardrails), Terraform in Git but terraform apply from engineer laptops to staging, CI that ran tests and then kubectl apply to a shared EKS cluster. Representative shape: ~6 deploy-related incidents/month (wrong image, drifted config, rollback that did not stick). One quarter focused only on the L2→L3 delivery gate — CI builds and opens PRs; Argo CD reconciles; kubectl apply removed from pipeline — incidents dropped to ~2/month without changing instance sizes or adding headcount. The lever was measurement and boundary, not a new tool category.

The four levels (AWS gates, not adjectives)

LevelNameYou know you’re here when…AWS anchors
L1Ad-hocConsole changes; no single deploy path; on-call learns about prod from usersSingle account; CloudWatch optional
L2RepeatableIaC in Git; CI builds/tests; deploy is manual, scripted, or pipeline-pushCodePipeline/GitHub Actions; Terraform/CDK; basic alarms
L3ManagedOne writer to prod (GitOps or gated CD); multi-account LZ; app traces/metrics on tier-1EKS + Argo CD/Flux; Control Tower or LZA; ADOT; Config rules; Organizations SCPs
L4OptimizingSLOs/error budgets; scheduled FIS with stop conditions; cost in CI; self-service golden pathsFIS + Resilience Hub; AMP/AMG or App Signals; tag policies + anomaly detection; IDP/templates

Opinionated take: score per capability, not one number for the whole org. It is normal to be L3 on CI/CD and L1 on resilience — that honesty is the point.

Score yourself (use the artifact)

Download the maturity scorecard CSV. Eight capabilities:

  1. IaC foundation — versioned infra, drift awareness
  2. CI/CD delivery — who writes to prod?
  3. Multi-account — landing zone vs account sprawl
  4. Observability — infra metrics vs service SLOs
  5. Security shift-left — secrets, OIDC, scanning
  6. Resilience — hope vs FIS program (maturity matrix for FIS specifically)
  7. FinOps in platform — tags, chargeback, cost-aware CI
  8. Self-service — tickets vs golden paths

For each row, pick current and target level. If you cannot link evidence (pipeline URL, SCP ID, experiment template), pick the lower level.

The L2 → L3 jumps that actually move incidents

1. Delivery: one writer to production

If CI and a human can both change prod, you are L2. L3 requires exactly one reconciler — GitOps on EKS or gated CodePipeline/CodeDeploy. See the GitOps post for the five traps; the maturity lens is simple: can a Git revert roll back prod? If not, stay L2 until fixed.

2. Multi-account: policy follows OU

L2 is “we have multiple accounts.” L3 is OU structure + baseline SCPs via Control Tower or equivalent, plus day-2 sharing patterns. Declarative policies (GA December 2024) belong in the platform baseline — not hand-maintained SCP denylists per API.

3. Observability: ADOT or equivalent on Kubernetes

L2 is CPU and 5xx alarms. L3 is traces + service metrics — on EKS, the ADOT add-on is the supported path to CloudWatch, X-Ray, and AMP. Deep dive: observability beyond CloudWatch.

4. Resilience: one scheduled FIS experiment

L3 resilience is not “we will do chaos someday.” It is one FIS template with CloudWatch alarm stop conditions, run on a schedule in non-prod, documented steady-state hypothesis. L4 adds prod GameDays and pipeline gates — see FIS resilience program.

What broke — A team scored themselves L3 on CI/CD because they “used Argo CD.” Under audit: CI still ran helm upgrade on merge to main, and Argo CD reconciled the same chart from a different branch. Two writers, weekly drift, rollbacks that “succeeded” in Git but not in cluster. They dropped to an honest L2, removed helm upgrade from CI, and re-scored L3 six weeks later. Tool installed ≠ level achieved.

90-day upgrade (one capability only)

Use the level-up roadmap template. Rules:

  • One capability level-up per quarter
  • Weeks 1–2: baseline metrics only — no new tools
  • Weeks 3–6: working change in non-prod
  • Weeks 7–12: prod (tag-scoped) + re-score

Trying to jump IaC + GitOps + FIS + FinOps in one quarter is how programs die at L2 forever.

What to do this week

  1. Download the scorecard and fill current levels with evidence links.
  2. Pick one L2→L3 row — usually cicd_delivery or multi_account.
  3. Run the delivery audit: does any human or CI job bypass the reconciler? If yes, that is this quarter’s project.
  4. Schedule a 60-minute re-score in 90 days — same attendees, same CSV.

What this post doesn’t cover

  • Workload-level Well-Architected reviews — use WAFR for depth on a single system.
  • DORA metrics benchmarking — we use levels here; you can map deploy frequency to levels separately.
  • Full GitOps or FIS tutorials — see linked pillar posts.
  • Team topology / platform org design — see CCoE operating model.

Related: DevOps pipeline setup · 10 AWS DevOps practices · GitOps on EKS · Cost-aware CI/CD

If you only do one thing: Score cicd_delivery honestly. If two systems write to production, fix that before buying any other platform tool.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Recommended Reading

Explore All Articles »
6 min

Observability Beyond CloudWatch (2026): When to Add Application Signals, ADOT, Managed Prometheus, and Grafana — and When Not To

The reflex to bolt Amazon Managed Prometheus + Grafana onto every workload is how observability bills quietly double. CloudWatch Application Signals now gives you an auto-discovered service map, SLOs, and traces with near-zero setup; AMP only earns its keep when you are PromQL-native or drowning in high-cardinality metrics — where ingestion (not retention) is the cost driver. Here is the decision matrix, an ADOT dual-export config, and the three levers that actually cut the AMP bill.