When should we scale the monolith in place instead of splitting to microservices?

Scale in place when the bottleneck is infrastructure shape (connection storms, missing read path, synchronous side effects) rather than team or domain boundaries. If one deployable unit still matches org ownership and p99 latency can be fixed with pools, replicas, cache, and queues, decomposition adds network tax without throughput gain. Split only after the scaling ladder fails a written capacity test and you have CI/CD plus on-call for multiple services.

When should we NOT add read replicas before fixing the application?

Skip replica routing when queries are not read-only, when your ORM cannot target a reader endpoint safely, or when the workload needs read-your-writes consistency on every request. Replicas also fail when lag exceeds user tolerance — inventory and pricing reads that cannot tolerate 500 ms staleness must stay on the writer until caching or CQRS is designed. Adding replicas to a leak that opens 4,800 connections does not fix pool exhaustion.

What breaks during connection pool misconfiguration?

Symptoms: p99 spikes during deploy, FATAL too many connections, or thousands of idle in transaction sessions in pg_stat_activity. Common cause: per-task pool size 80–100 multiplied by ECS task count exceeding Aurora max_connections. RDS Proxy multiplexes application connections but does not fix long transactions or N+1 queries. Roll back pool caps if Proxy borrow timeouts increase — shorten queries before raising limits.

How does ECS Express Mode help horizontal scale without a platform team?

ECS Express Mode (announced November 21, 2025; GovCloud June 15, 2026) deploys HTTPS on Fargate from three inputs — image plus two IAM roles — with ALB, auto scaling, and canary rollback. Use it to add stateless web tier capacity without hand-rolling target groups for every spike. It does not replace database scaling; pair horizontal app scale with Steps 2–5 of the ladder. Skip Express Mode when you need NLB-only ingress or org-mandated Terraform modules for every resource.

When is async offload via SQS the right lever?

Move email, webhooks, PDF generation, search indexing, and partner callbacks off the request path when users do not need synchronous confirmation. Requires idempotent workers, DLQs, and UX that reflects async completion. If queue depth grows because handlers are slow, fix handler duration before adding shards — see the high-throughput tier guide for FIFO versus Kinesis ceilings.

What could go wrong after enabling ElastiCache?

Cache stampede on hot keys, stale inventory displayed after writes, and session data leaked across tenants if keys are not namespaced. Roll back aggressive TTL if hit ratio stays below 60% after 48 hours — you are caching the wrong objects. Pair cache with read-replica routing only when staleness bounds are documented per endpoint.

Scale Legacy Monolith on AWS 2026: Before Microservices

Scale a Legacy Monolith on AWS Before You Split It (2026): Vertical, Read Path, Cache, and Queue Offload Ladder

Quick summary: On a composite B2B order API (~2.4k RPS peak, 24 ECS tasks on Aurora), running the scaling ladder in order — RDS Proxy, read-replica routing, ElastiCache, then SQS offload — moved modeled p99 from 1,840 ms to 72 ms before any microservice split.

Key Takeaways

On a composite B2B order API (~2
On June 15, 2026, Amazon ECS Express Mode reached AWS GovCloud (US-East and US-West) — the same three-input HTTPS-on-Fargate path that shipped commercially on November 21, 2025
This post is the scale-in-place ladder — ordered levers before any microservice split
It is not zero-downtime ECS migration, not monolith vs microservices strategy, not modernization taxonomy, not greenfield ECS layout, and not runtime worker tuning only
Connection math: RDS max connection calculator

On June 15, 2026, Amazon ECS Express Mode reached AWS GovCloud (US-East and US-West) — the same three-input HTTPS-on-Fargate path that shipped commercially on November 21, 2025. Horizontal scale for stateless web tiers got cheaper to operate; database and connection limits did not move with it.

This post is the scale-in-place ladder — ordered levers before any microservice split. It is not zero-downtime ECS migration, not monolith vs microservices strategy, not modernization taxonomy, not greenfield ECS layout, and not runtime worker tuning only.

Artifacts: scaling ladder checklist, capacity worksheet CSV. Connection math: RDS max connection calculator.

Benchmark pattern (not a cited client) — B2B order API monolith, ~2.4k RPS peak, 24 ECS tasks on Aurora PostgreSQL (db.r6g.2xlarge), us-east-1. Baseline p99 1,840 ms, ~4,800 DB connections during deploy (mostly idle in transaction). After ladder: RDS Proxy + pool cap 15/task → 620 connections, p99 340 ms; read-replica routing (72% reads) → 210 ms; ElastiCache on catalog GETs (78% hit ratio) → 95 ms; SQS webhook offload → 88 ms; Express Mode horizontal scale (36 tasks) → 72 ms — no service split.

Do not split until step 7 fails

Step	Lever	Typical win	Rollback trigger
0	Baseline metrics	Proof later levers worked	No baseline → stop
1	Vertical + Graviton canary	CPU headroom	p99 worse → profile locks
2	Pool cap + RDS Proxy	Connection storms	Proxy borrow timeouts
3	Read replica routing	Writer CPU	User-visible stale reads
4	ElastiCache hot keys	Read latency	Hit ratio < 60%
5	SQS async offload	Request path slim	UX still sync
6	ECS Express Mode horizontal	RPS headroom	5xx on fast scale-out
7	Decomposition gate	Team/domain fit	Missing saga design

Opinionated take: Run steps 1–6 in order. Microservices are a team and domain decision — not a performance knob. If step 2 fails, step 6 only spreads broken pool math across more tasks.

Step 2 — Connection pool + RDS Proxy (most common miss)

Aurora max_connections scales with instance class — not with microservice count. 80 ECS tasks × pool 20 = 1,600 client slots on a writer with ~5,000 ceiling leaves little room for admin and autovacuum (connection pool guide — June 2026).

Signal	Mechanism	Fix
`too many connections`	Pool × tasks > max	RDS Proxy; cap pool 10–20/task
p99 spike on deploy	Thundering herd	Staggered rollout; drain connections
`idle in transaction`	ORM session leak	Timeouts; code fix before replicas

What broke — Logistics SaaS on Aurora (~$18k/mo DB): deploy triggered p99 5.2 s; pg_stat_activity showed 4,800 connections, mostly idle in transaction from a leaked ORM session. Raising max_connections was rejected (OOM risk). RDS Proxy + pool cap 15/task → 620 connections; p99 220 ms — same monolith binary, no microservice split.

Step 3 — Read path without premature CQRS

Route provably read-only queries to Aurora reader endpoint or RDS read replica. Measure replica lag — if p99 reads exceed ~500 ms staleness during write bursts, tighten routing rules or cache.

Do not send session-sensitive reads to replicas without a documented staleness budget.

Step 4 — Cache what hurts, not everything

Target idempotent GETs with explicit TTL per entity. Use stampede protection (single-flight, jittered TTL) on catalog and config keys.

Anti-pattern	Outcome
Cache personalized auth blobs in shared Redis	Cross-tenant risk
Infinite TTL on inventory	Oversell during writes
Cache before fixing N+1	Low hit ratio, high cost

Step 5 — Async offload (pair with throughput tier guide)

Move webhooks, email, PDF, search index updates to SQS or EventBridge. Requires idempotent workers and DLQ tuning.

If ordered offload exceeds ~8k TPS per entity stream, read high-throughput event processing tier selection before FIFO caps bite.

Step 6 — Horizontal scale with ECS Express Mode

For stateless HTTP tiers, ECS Express Mode collapses ALB + Fargate + HTTPS into three inputs — useful when the monolith is already containerized but ingress boilerplate blocked quick scale-out (Express Mode guide).

Autoscale on CPU and ALB request count
Min tasks ≥ 2 for AZ redundancy
Up to 25 services can share one ALB — isolate blast radius with host rules

Express Mode does not scale the database — pair with Steps 2–5.

Step 7 — Decomposition gate (all must be true)

≥3 services can deploy and roll back independently
Domain boundaries are stable (not “split by folder”)
Cross-service transactions replaced with sagas/outbox — written, not assumed

Otherwise stay modular monolith per microservices decision guide.

What to do this week

Export baseline p99, connections, and top 5 slow queries — worksheet CSV.
Cap per-task pools; add RDS Proxy if tasks × pool > 40% of max_connections.
Route one high-volume read endpoint to a replica; alarm on lag.
Cache one hot GET with measured hit ratio target >70%.
Move one side effect (webhooks/email) to SQS; verify DLQ depth alarms.
Re-test peak; only then schedule a decomposition workshop.

What this post doesn’t cover

Lift-and-shift to ECS from on-prem — monolith ECS migration.
Aurora Limitless sharding — Limitless post for write-scale escape hatch only.
Lambda Managed Instances — burst offload niche; not default monolith scale.
Full strangler-fig migration program — data center exit program.

Scale a Legacy Monolith on AWS Before You Split It (2026): Vertical, Read Path, Cache, and Queue Offload Ladder

Do not split until step 7 fails

Step 2 — Connection pool + RDS Proxy (most common miss)

Step 3 — Read path without premature CQRS

Step 4 — Cache what hurts, not everything

Step 5 — Async offload (pair with throughput tier guide)

Step 6 — Horizontal scale with ECS Express Mode

Step 7 — Decomposition gate (all must be true)

What to do this week

What this post doesn’t cover

Related AWS Services

AWS Architecture Review

AWS Serverless

AWS Migration

Recommended Reading

How to Migrate a Monolith to ECS Fargate Without Downtime

Microservices vs Monolith on AWS: Architecture Decision Guide

Database Deadlocks, Connection Pool Exhaustion, and Prepared Statements on RDS

Amazon ECS Express Mode: Three Inputs, One HTTPS URL, and No Platform Team

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

Do not split until step 7 fails

Step 2 — Connection pool + RDS Proxy (most common miss)

Step 3 — Read path without premature CQRS

Step 4 — Cache what hurts, not everything

Step 5 — Async offload (pair with throughput tier guide)

Step 6 — Horizontal scale with ECS Express Mode

Step 7 — Decomposition gate (all must be true)

What to do this week

What this post doesn’t cover

Related AWS Services

AWS Architecture Review

AWS Serverless

AWS Migration

Recommended Reading

How to Migrate a Monolith to ECS Fargate Without Downtime

Microservices vs Monolith on AWS: Architecture Decision Guide

Database Deadlocks, Connection Pool Exhaustion, and Prepared Statements on RDS

Amazon ECS Express Mode: Three Inputs, One HTTPS URL, and No Platform Team