How is this different from the AWS Well-Architected Framework or a generic CMMI maturity model?

Well-Architected reviews answer "is this workload healthy?" across six pillars — one workload at a time. Generic maturity models (CMMI, DORA metrics alone) tell you to "improve culture" without naming the AWS control that proves the level changed. This model is narrower and operational: each level has verifiable gates tied to specific AWS services (Organizations SCPs at L3 multi-account, ADOT EKS add-on at L3 observability, FIS with CloudWatch stop conditions at L4 resilience). Use Well-Architected for workload depth; use this scorecard for platform program planning.

What level should a 50-person product engineering org target?

Most 50–200 engineer orgs on AWS should honestly score L2 today (IaC + CI, single or few accounts) and target L3 on tier-1 services within 12 months. L3 means gated delivery (GitOps or approved CD), multi-account landing zone, observability with alarms on critical paths, and Config or equivalent detective controls — not necessarily L4 FIS-in-CI everywhere. Trying to jump straight to L4 without L3 delivery discipline usually produces impressive demos and unchanged incident rates.

When is L4 "optimizing" maturity the wrong goal?

Skip L4 investments when the workload is low criticality (internal admin tools, batch jobs with flexible SLAs), when team size is under ~15 engineers and L3 process overhead exceeds benefit, or when you lack an executive sponsor for recurring GameDays and SLO programs. L4 (FIS in pipeline gates, error budgets, cost-aware CI on every PR) pays off on revenue paths and regulated tier-1 systems — not on every Lambda cron job.

What is the fastest L2 → L3 upgrade if we can only fix one thing?

Fix the CI/reconcile boundary: if CI still runs kubectl apply or terraform apply -auto-approve to production, you are L2 regardless of how much GitOps tooling you bought. Make the pipeline build, test, and open a change to the deployment repo; let exactly one system (GitOps controller or gated CD) write to prod. Teams that make only this change often see deploy-related incidents drop 50–70% in the next quarter because rollbacks become Git reverts that actually work.

How does ADOT on EKS fit the maturity model?

L2 observability is CloudWatch metrics and alarms on infrastructure. L3 adds application-level traces and consistent service metrics — on EKS that typically means the AWS Distro for OpenTelemetry (ADOT) installed as an EKS add-on, exporting to CloudWatch, X-Ray, and/or Amazon Managed Prometheus. L4 adds SLOs derived from those signals (success rate, p99 latency) with error budgets that gate releases. Without ADOT or an equivalent OTel path, you are guessing at service health from CPU graphs.

What could go wrong if we score ourselves too optimistically?

Inflated scores fund the wrong roadmap — you buy a chaos engineering program (L4) while developers still deploy from laptops (L1 delivery). Run the scorecard row-by-row with evidence: link to the pipeline, the SCP attachment, the FIS experiment template. If you cannot point to the artifact, score the lower level. Re-score quarterly; maturity is a trajectory, not a badge.

AWS DevOps Maturity Model 2026: L1-L4 + 90-Day Roadmap

AWS DevOps & Platform Maturity Model (2026): A 4-Level Scorecard Anchored to Real Services

Quick summary: Generic DevOps maturity models score you on culture slides — this one maps L1–L4 to AWS gates you can verify: IaC in Git, GitOps or gated CD, ADOT on EKS, FIS with stop conditions, and cost-aware CI. A composite 85-engineer SaaS moved from L2 to L3 in one quarter by fixing the CI/GitOps boundary alone, cutting deploy-related incidents from ~6/month to 2.

Key Takeaways

Generic DevOps maturity models score you on culture slides — this one maps L1–L4 to AWS gates you can verify: IaC in Git, GitOps or gated CD, ADOT on EKS, FIS with stop conditions, and cost-aware CI
A composite 85-engineer SaaS moved from L2 to L3 in one quarter by fixing the CI/GitOps boundary alone, cutting deploy-related incidents from ~6/month to 2
On June 11, 2026, most AWS platform teams do not have a maturity problem — they have a measurement problem
This post is a four-level, AWS-anchored maturity model for platform and DevOps programs
It is not a replacement for 10 AWS DevOps practices we use in production — that post is what to do

On June 11, 2026, most AWS platform teams do not have a maturity problem — they have a measurement problem. Leadership asks for “DevOps maturity” and gets a CMMI worksheet or a DORA dashboard with deploy frequency and lead time, but no answer to the operational question: what do we build next quarter, and how do we know it worked? AWS has shipped concrete platform primitives since re:Invent 2024 — declarative policies for durable EC2/VPC/EBS baselines, Resource Control Policies through February 2026 (including DynamoDB), ADOT as an EKS add-on, and FIS scenarios integrated with AWS Resilience Hub — but those land as feature announcements, not as levels on a scorecard.

This post is a four-level, AWS-anchored maturity model for platform and DevOps programs. It is not a replacement for 10 AWS DevOps practices we use in production — that post is what to do. This one is where you are and what to do next, with a downloadable scorecard and 90-day upgrade template.

Benchmark pattern (not a cited client) — Composite B2B SaaS, ~85 engineers, 4 AWS accounts (no OU guardrails), Terraform in Git but terraform apply from engineer laptops to staging, CI that ran tests and then kubectl apply to a shared EKS cluster. Representative shape: ~6 deploy-related incidents/month (wrong image, drifted config, rollback that did not stick). One quarter focused only on the L2→L3 delivery gate — CI builds and opens PRs; Argo CD reconciles; kubectl apply removed from pipeline — incidents dropped to ~2/month without changing instance sizes or adding headcount. The lever was measurement and boundary, not a new tool category.

The four levels (AWS gates, not adjectives)

Level	Name	You know you’re here when…	AWS anchors
L1	Ad-hoc	Console changes; no single deploy path; on-call learns about prod from users	Single account; CloudWatch optional
L2	Repeatable	IaC in Git; CI builds/tests; deploy is manual, scripted, or pipeline-push	CodePipeline/GitHub Actions; Terraform/CDK; basic alarms
L3	Managed	One writer to prod (GitOps or gated CD); multi-account LZ; app traces/metrics on tier-1	EKS + Argo CD/Flux; Control Tower or LZA; ADOT; Config rules; Organizations SCPs
L4	Optimizing	SLOs/error budgets; scheduled FIS with stop conditions; cost in CI; self-service golden paths	FIS + Resilience Hub; AMP/AMG or App Signals; tag policies + anomaly detection; IDP/templates

Opinionated take: score per capability, not one number for the whole org. It is normal to be L3 on CI/CD and L1 on resilience — that honesty is the point.

Score yourself (use the artifact)

Download the maturity scorecard CSV. Eight capabilities:

IaC foundation — versioned infra, drift awareness
CI/CD delivery — who writes to prod?
Multi-account — landing zone vs account sprawl
Observability — infra metrics vs service SLOs
Security shift-left — secrets, OIDC, scanning
Resilience — hope vs FIS program (maturity matrix for FIS specifically)
FinOps in platform — tags, chargeback, cost-aware CI
Self-service — tickets vs golden paths

For each row, pick current and target level. If you cannot link evidence (pipeline URL, SCP ID, experiment template), pick the lower level.

The L2 → L3 jumps that actually move incidents

1. Delivery: one writer to production

If CI and a human can both change prod, you are L2. L3 requires exactly one reconciler — GitOps on EKS or gated CodePipeline/CodeDeploy. See the GitOps post for the five traps; the maturity lens is simple: can a Git revert roll back prod? If not, stay L2 until fixed.

2. Multi-account: policy follows OU

L2 is “we have multiple accounts.” L3 is OU structure + baseline SCPs via Control Tower or equivalent, plus day-2 sharing patterns. Declarative policies (GA December 2024) belong in the platform baseline — not hand-maintained SCP denylists per API.

3. Observability: ADOT or equivalent on Kubernetes

L2 is CPU and 5xx alarms. L3 is traces + service metrics — on EKS, the ADOT add-on is the supported path to CloudWatch, X-Ray, and AMP. Deep dive: observability beyond CloudWatch.

4. Resilience: one scheduled FIS experiment

L3 resilience is not “we will do chaos someday.” It is one FIS template with CloudWatch alarm stop conditions, run on a schedule in non-prod, documented steady-state hypothesis. L4 adds prod GameDays and pipeline gates — see FIS resilience program.

What broke — A team scored themselves L3 on CI/CD because they “used Argo CD.” Under audit: CI still ran helm upgrade on merge to main, and Argo CD reconciled the same chart from a different branch. Two writers, weekly drift, rollbacks that “succeeded” in Git but not in cluster. They dropped to an honest L2, removed helm upgrade from CI, and re-scored L3 six weeks later. Tool installed ≠ level achieved.

90-day upgrade (one capability only)

Use the level-up roadmap template. Rules:

One capability level-up per quarter
Weeks 1–2: baseline metrics only — no new tools
Weeks 3–6: working change in non-prod
Weeks 7–12: prod (tag-scoped) + re-score

Trying to jump IaC + GitOps + FIS + FinOps in one quarter is how programs die at L2 forever.

What to do this week

Download the scorecard and fill current levels with evidence links.
Pick one L2→L3 row — usually cicd_delivery or multi_account.
Run the delivery audit: does any human or CI job bypass the reconciler? If yes, that is this quarter’s project.
Schedule a 60-minute re-score in 90 days — same attendees, same CSV.

What this post doesn’t cover

Workload-level Well-Architected reviews — use WAFR for depth on a single system.
DORA metrics benchmarking — we use levels here; you can map deploy frequency to levels separately.
Full GitOps or FIS tutorials — see linked pillar posts.
Team topology / platform org design — see CCoE operating model.

If you only do one thing: Score cicd_delivery honestly. If two systems write to production, fix that before buying any other platform tool.

AWS DevOps & Platform Maturity Model (2026): A 4-Level Scorecard Anchored to Real Services

The four levels (AWS gates, not adjectives)

Score yourself (use the artifact)

The L2 → L3 jumps that actually move incidents

1. Delivery: one writer to production

2. Multi-account: policy follows OU

3. Observability: ADOT or equivalent on Kubernetes

4. Resilience: one scheduled FIS experiment

90-day upgrade (one capability only)

What to do this week

What this post doesn’t cover

Recommended Reading

AWS CDK vs CloudFormation vs AWS Blocks: Enterprise IaC Decision Guide (2026)

10 AWS DevOps Practices We Actually Use in Production in 2026

Observability Beyond CloudWatch (2026): When to Add Application Signals, ADOT, Managed Prometheus, and Grafana — and When Not To

AWS CodePipeline: CI/CD Pipeline Patterns for Production

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

The four levels (AWS gates, not adjectives)

Score yourself (use the artifact)

The L2 → L3 jumps that actually move incidents

1. Delivery: one writer to production

2. Multi-account: policy follows OU

3. Observability: ADOT or equivalent on Kubernetes

4. Resilience: one scheduled FIS experiment

90-day upgrade (one capability only)

What to do this week

What this post doesn’t cover

Recommended Reading

AWS CDK vs CloudFormation vs AWS Blocks: Enterprise IaC Decision Guide (2026)

10 AWS DevOps Practices We Actually Use in Production in 2026

Observability Beyond CloudWatch (2026): When to Add Application Signals, ADOT, Managed Prometheus, and Grafana — and When Not To

AWS CodePipeline: CI/CD Pipeline Patterns for Production