Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

On a composite B2B order API (~2.4k RPS peak, 24 ECS tasks on Aurora), running the scaling ladder in order — RDS Proxy, read-replica routing, ElastiCache, then SQS offload — moved modeled p99 from 1,840 ms to 72 ms before any microservice split.

Key Facts

  • On a composite B2B order API (~2
  • On June 15, 2026, Amazon ECS Express Mode reached AWS GovCloud (US-East and US-West) — the same three-input HTTPS-on-Fargate path that shipped commercially on November 21, 2025
  • This post is the scale-in-place ladder — ordered levers before any microservice split
  • It is not zero-downtime ECS migration, not monolith vs microservices strategy, not modernization taxonomy, not greenfield ECS layout, and not runtime worker tuning only
  • Connection math: RDS max connection calculator

Entity Definitions

Lambda
Lambda is an AWS service discussed in this article.
RDS
RDS is an AWS service discussed in this article.
Aurora
Aurora is an AWS service discussed in this article.
ECS
ECS is an AWS service discussed in this article.
Amazon ECS
Amazon ECS is an AWS service discussed in this article.
EventBridge
EventBridge is an AWS service discussed in this article.
SQS
SQS is an AWS service discussed in this article.
ElastiCache
ElastiCache is an AWS service discussed in this article.

Scale a Legacy Monolith on AWS Before You Split It (2026): Vertical, Read Path, Cache, and Queue Offload Ladder

Cloud ArchitecturePalaniappan P4 min read

Quick summary: On a composite B2B order API (~2.4k RPS peak, 24 ECS tasks on Aurora), running the scaling ladder in order — RDS Proxy, read-replica routing, ElastiCache, then SQS offload — moved modeled p99 from 1,840 ms to 72 ms before any microservice split.

Key Takeaways

  • On a composite B2B order API (~2
  • On June 15, 2026, Amazon ECS Express Mode reached AWS GovCloud (US-East and US-West) — the same three-input HTTPS-on-Fargate path that shipped commercially on November 21, 2025
  • This post is the scale-in-place ladder — ordered levers before any microservice split
  • It is not zero-downtime ECS migration, not monolith vs microservices strategy, not modernization taxonomy, not greenfield ECS layout, and not runtime worker tuning only
  • Connection math: RDS max connection calculator
Scale a Legacy Monolith on AWS Before You Split It (2026): Vertical, Read Path, Cache, and Queue Offload Ladder
Table of Contents

On June 15, 2026, Amazon ECS Express Mode reached AWS GovCloud (US-East and US-West) — the same three-input HTTPS-on-Fargate path that shipped commercially on November 21, 2025. Horizontal scale for stateless web tiers got cheaper to operate; database and connection limits did not move with it.

This post is the scale-in-place ladder — ordered levers before any microservice split. It is not zero-downtime ECS migration, not monolith vs microservices strategy, not modernization taxonomy, not greenfield ECS layout, and not runtime worker tuning only.

Artifacts: scaling ladder checklist, capacity worksheet CSV. Connection math: RDS max connection calculator.

Benchmark pattern (not a cited client) — B2B order API monolith, ~2.4k RPS peak, 24 ECS tasks on Aurora PostgreSQL (db.r6g.2xlarge), us-east-1. Baseline p99 1,840 ms, ~4,800 DB connections during deploy (mostly idle in transaction). After ladder: RDS Proxy + pool cap 15/task → 620 connections, p99 340 ms; read-replica routing (72% reads) → 210 ms; ElastiCache on catalog GETs (78% hit ratio) → 95 ms; SQS webhook offload → 88 ms; Express Mode horizontal scale (36 tasks) → 72 msno service split.

Do not split until step 7 fails

StepLeverTypical winRollback trigger
0Baseline metricsProof later levers workedNo baseline → stop
1Vertical + Graviton canaryCPU headroomp99 worse → profile locks
2Pool cap + RDS ProxyConnection stormsProxy borrow timeouts
3Read replica routingWriter CPUUser-visible stale reads
4ElastiCache hot keysRead latencyHit ratio < 60%
5SQS async offloadRequest path slimUX still sync
6ECS Express Mode horizontalRPS headroom5xx on fast scale-out
7Decomposition gateTeam/domain fitMissing saga design

Opinionated take: Run steps 1–6 in order. Microservices are a team and domain decision — not a performance knob. If step 2 fails, step 6 only spreads broken pool math across more tasks.

Step 2 — Connection pool + RDS Proxy (most common miss)

Aurora max_connections scales with instance class — not with microservice count. 80 ECS tasks × pool 20 = 1,600 client slots on a writer with ~5,000 ceiling leaves little room for admin and autovacuum (connection pool guide — June 2026).

SignalMechanismFix
too many connectionsPool × tasks > maxRDS Proxy; cap pool 10–20/task
p99 spike on deployThundering herdStaggered rollout; drain connections
idle in transactionORM session leakTimeouts; code fix before replicas

What broke — Logistics SaaS on Aurora (~$18k/mo DB): deploy triggered p99 5.2 s; pg_stat_activity showed 4,800 connections, mostly idle in transaction from a leaked ORM session. Raising max_connections was rejected (OOM risk). RDS Proxy + pool cap 15/task → 620 connections; p99 220 ms — same monolith binary, no microservice split.

Step 3 — Read path without premature CQRS

Route provably read-only queries to Aurora reader endpoint or RDS read replica. Measure replica lag — if p99 reads exceed ~500 ms staleness during write bursts, tighten routing rules or cache.

Do not send session-sensitive reads to replicas without a documented staleness budget.

Step 4 — Cache what hurts, not everything

Target idempotent GETs with explicit TTL per entity. Use stampede protection (single-flight, jittered TTL) on catalog and config keys.

Anti-patternOutcome
Cache personalized auth blobs in shared RedisCross-tenant risk
Infinite TTL on inventoryOversell during writes
Cache before fixing N+1Low hit ratio, high cost

Step 5 — Async offload (pair with throughput tier guide)

Move webhooks, email, PDF, search index updates to SQS or EventBridge. Requires idempotent workers and DLQ tuning.

If ordered offload exceeds ~8k TPS per entity stream, read high-throughput event processing tier selection before FIFO caps bite.

Step 6 — Horizontal scale with ECS Express Mode

For stateless HTTP tiers, ECS Express Mode collapses ALB + Fargate + HTTPS into three inputs — useful when the monolith is already containerized but ingress boilerplate blocked quick scale-out (Express Mode guide).

  • Autoscale on CPU and ALB request count
  • Min tasks ≥ 2 for AZ redundancy
  • Up to 25 services can share one ALB — isolate blast radius with host rules

Express Mode does not scale the database — pair with Steps 2–5.

Step 7 — Decomposition gate (all must be true)

  • ≥3 services can deploy and roll back independently
  • Domain boundaries are stable (not “split by folder”)
  • Cross-service transactions replaced with sagas/outbox — written, not assumed

Otherwise stay modular monolith per microservices decision guide.

What to do this week

  1. Export baseline p99, connections, and top 5 slow queries — worksheet CSV.
  2. Cap per-task pools; add RDS Proxy if tasks × pool > 40% of max_connections.
  3. Route one high-volume read endpoint to a replica; alarm on lag.
  4. Cache one hot GET with measured hit ratio target >70%.
  5. Move one side effect (webhooks/email) to SQS; verify DLQ depth alarms.
  6. Re-test peak; only then schedule a decomposition workshop.

What this post doesn’t cover

Related: Application modernization · AWS migration · RDS connection calculator

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Recommended Reading

Explore All Articles »