Scale a Legacy Monolith on AWS Before You Split It (2026): Vertical, Read Path, Cache, and Queue Offload Ladder
Quick summary: On a composite B2B order API (~2.4k RPS peak, 24 ECS tasks on Aurora), running the scaling ladder in order — RDS Proxy, read-replica routing, ElastiCache, then SQS offload — moved modeled p99 from 1,840 ms to 72 ms before any microservice split.
Key Takeaways
- On a composite B2B order API (~2
- On June 15, 2026, Amazon ECS Express Mode reached AWS GovCloud (US-East and US-West) — the same three-input HTTPS-on-Fargate path that shipped commercially on November 21, 2025
- This post is the scale-in-place ladder — ordered levers before any microservice split
- It is not zero-downtime ECS migration, not monolith vs microservices strategy, not modernization taxonomy, not greenfield ECS layout, and not runtime worker tuning only
- Connection math: RDS max connection calculator

Table of Contents
On June 15, 2026, Amazon ECS Express Mode reached AWS GovCloud (US-East and US-West) — the same three-input HTTPS-on-Fargate path that shipped commercially on November 21, 2025. Horizontal scale for stateless web tiers got cheaper to operate; database and connection limits did not move with it.
This post is the scale-in-place ladder — ordered levers before any microservice split. It is not zero-downtime ECS migration, not monolith vs microservices strategy, not modernization taxonomy, not greenfield ECS layout, and not runtime worker tuning only.
Artifacts: scaling ladder checklist, capacity worksheet CSV. Connection math: RDS max connection calculator.
Benchmark pattern (not a cited client) — B2B order API monolith, ~2.4k RPS peak, 24 ECS tasks on Aurora PostgreSQL (
db.r6g.2xlarge), us-east-1. Baseline p99 1,840 ms, ~4,800 DB connections during deploy (mostlyidle in transaction). After ladder: RDS Proxy + pool cap 15/task → 620 connections, p99 340 ms; read-replica routing (72% reads) → 210 ms; ElastiCache on catalog GETs (78% hit ratio) → 95 ms; SQS webhook offload → 88 ms; Express Mode horizontal scale (36 tasks) → 72 ms — no service split.
Do not split until step 7 fails
| Step | Lever | Typical win | Rollback trigger |
|---|---|---|---|
| 0 | Baseline metrics | Proof later levers worked | No baseline → stop |
| 1 | Vertical + Graviton canary | CPU headroom | p99 worse → profile locks |
| 2 | Pool cap + RDS Proxy | Connection storms | Proxy borrow timeouts |
| 3 | Read replica routing | Writer CPU | User-visible stale reads |
| 4 | ElastiCache hot keys | Read latency | Hit ratio < 60% |
| 5 | SQS async offload | Request path slim | UX still sync |
| 6 | ECS Express Mode horizontal | RPS headroom | 5xx on fast scale-out |
| 7 | Decomposition gate | Team/domain fit | Missing saga design |
Opinionated take: Run steps 1–6 in order. Microservices are a team and domain decision — not a performance knob. If step 2 fails, step 6 only spreads broken pool math across more tasks.
Step 2 — Connection pool + RDS Proxy (most common miss)
Aurora max_connections scales with instance class — not with microservice count. 80 ECS tasks × pool 20 = 1,600 client slots on a writer with ~5,000 ceiling leaves little room for admin and autovacuum (connection pool guide — June 2026).
| Signal | Mechanism | Fix |
|---|---|---|
too many connections | Pool × tasks > max | RDS Proxy; cap pool 10–20/task |
| p99 spike on deploy | Thundering herd | Staggered rollout; drain connections |
idle in transaction | ORM session leak | Timeouts; code fix before replicas |
What broke — Logistics SaaS on Aurora (~$18k/mo DB): deploy triggered p99 5.2 s;
pg_stat_activityshowed 4,800 connections, mostlyidle in transactionfrom a leaked ORM session. Raisingmax_connectionswas rejected (OOM risk). RDS Proxy + pool cap 15/task → 620 connections; p99 220 ms — same monolith binary, no microservice split.
Step 3 — Read path without premature CQRS
Route provably read-only queries to Aurora reader endpoint or RDS read replica. Measure replica lag — if p99 reads exceed ~500 ms staleness during write bursts, tighten routing rules or cache.
Do not send session-sensitive reads to replicas without a documented staleness budget.
Step 4 — Cache what hurts, not everything
Target idempotent GETs with explicit TTL per entity. Use stampede protection (single-flight, jittered TTL) on catalog and config keys.
| Anti-pattern | Outcome |
|---|---|
| Cache personalized auth blobs in shared Redis | Cross-tenant risk |
| Infinite TTL on inventory | Oversell during writes |
| Cache before fixing N+1 | Low hit ratio, high cost |
Step 5 — Async offload (pair with throughput tier guide)
Move webhooks, email, PDF, search index updates to SQS or EventBridge. Requires idempotent workers and DLQ tuning.
If ordered offload exceeds ~8k TPS per entity stream, read high-throughput event processing tier selection before FIFO caps bite.
Step 6 — Horizontal scale with ECS Express Mode
For stateless HTTP tiers, ECS Express Mode collapses ALB + Fargate + HTTPS into three inputs — useful when the monolith is already containerized but ingress boilerplate blocked quick scale-out (Express Mode guide).
- Autoscale on CPU and ALB request count
- Min tasks ≥ 2 for AZ redundancy
- Up to 25 services can share one ALB — isolate blast radius with host rules
Express Mode does not scale the database — pair with Steps 2–5.
Step 7 — Decomposition gate (all must be true)
- ≥3 services can deploy and roll back independently
- Domain boundaries are stable (not “split by folder”)
- Cross-service transactions replaced with sagas/outbox — written, not assumed
Otherwise stay modular monolith per microservices decision guide.
What to do this week
- Export baseline p99, connections, and top 5 slow queries — worksheet CSV.
- Cap per-task pools; add RDS Proxy if tasks × pool > 40% of
max_connections. - Route one high-volume read endpoint to a replica; alarm on lag.
- Cache one hot GET with measured hit ratio target >70%.
- Move one side effect (webhooks/email) to SQS; verify DLQ depth alarms.
- Re-test peak; only then schedule a decomposition workshop.
What this post doesn’t cover
- Lift-and-shift to ECS from on-prem — monolith ECS migration.
- Aurora Limitless sharding — Limitless post for write-scale escape hatch only.
- Lambda Managed Instances — burst offload niche; not default monolith scale.
- Full strangler-fig migration program — data center exit program.
Related: Application modernization · AWS migration · RDS connection calculator
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.




