Managed Database Operations on AWS (2026): RDS/Aurora Buyer Guide for In-House vs MSP
Quick summary: For a fintech SaaS (6 prod databases, ~$14k/mo RDS line), scoping in-house DBA for schema + MSP for 24/7 paging cut on-call from 22 to 9 hrs/mo — Aurora Serverless v2 min ACU tuning handled tier-2 first.
Key Takeaways
- Aurora Serverless v2 platform version v3 supports scaling from 0 to 256 ACUs (each ACU ≈ 2 GiB memory plus proportional CPU/network) with 0
- 5 ACU increments (AWS What's New — Serverless v2 performance)
- RDS Blue/Green Deployments block writes on both blue and green until replication catches up, then promote green with zero data loss design (RDS FAQ)
- This post is the managed database operations buyer guide — who runs what on RDS/Aurora, when AWS-managed automation suffices, and when a partner MSP earns its fee
- It is not RDS vs Aurora engine selection, not Postgres cost tuning alone, and not generic MSP scope

Table of Contents
Aurora Serverless v2 platform version v3 supports scaling from 0 to 256 ACUs (each ACU ≈ 2 GiB memory plus proportional CPU/network) with 0.5 ACU increments (AWS What’s New — Serverless v2 performance). RDS Blue/Green Deployments block writes on both blue and green until replication catches up, then promote green with zero data loss design (RDS FAQ). Aurora PostgreSQL Limitless adds sharded, reference, and standard table types for OLTP scale-out (Limitless architecture docs).
This post is the managed database operations buyer guide — who runs what on RDS/Aurora, when AWS-managed automation suffices, and when a partner MSP earns its fee. It is not RDS vs Aurora engine selection, not Postgres cost tuning alone, and not generic MSP scope.
Artifacts: ops scope RACI, patching window worksheet CSV.
Benchmark pattern (not a cited client) — Fintech SaaS, 6 production databases (3 Aurora PostgreSQL, 2 RDS MySQL, 1 ElastiCache Redis), ~$14k/mo RDS/Aurora line, ~22 hrs/mo internal on-call before scope definition. After RACI split (in-house DBA = schema + Performance Insights; MSP = 24/7 Sev1 paging + quarterly restore drill): ~9 hrs/mo internal on-call; Serverless v2 min ACU tuning on tier-2 saved ~$380/mo on analytics cluster.
Three operating models
| Model | You operate | AWS operates | Best when |
|---|---|---|---|
| AWS-managed RDS/Aurora | Schema, queries, capacity settings | Engine patch, Multi-AZ failover, automated backup | Default for most teams |
| In-house DBA + AWS | Tuning, migrations, on-call (business hours) | Infrastructure layer above | 3–8 instances, strong platform team |
| Partner MSP overlay | App schema ownership (sometimes) | Per SOW: paging, upgrades, DR drills | 24/7 tier-0 you cannot staff |
Opinionated take: AWS already “manages” the database engine — paying an MSP to watch automated patching is weak ROI unless they bring Blue/Green major upgrades, restore testing, and app-aware runbooks.
AWS-managed primitives — know what you already bought
| Feature | Ops value | You still own |
|---|---|---|
| Automated backups + PITR | Snapshot schedule | Restore drill, RPO proof |
| Multi-AZ / Aurora storage | Failover | Connection retry, RDS Proxy |
| Performance Insights | Wait event visibility | SQL/index remediation |
| RDS Proxy | Pooling, faster failover | Auth via Secrets Manager |
| Blue/Green Deployments | Major version staging | Switchover window + app retry |
| Aurora Serverless v2 | ACU autoscale | Min/max ACU policy |
| Aurora Limitless | Shard routing | Shard key + colocation design |
Aurora Serverless v2 — capacity policy
Monitor ServerlessDatabaseCapacity and ACUUtilization in CloudWatch (Serverless v2 scaling guide):
- Set min ACU to hold working set in buffer pool for tier-0
- Set max ACU high enough for spike headroom (256 ACU ceiling on v3)
- Auto-pause to 0 ACU — dev/test only unless cold-start acceptable
What broke — Production analytics cluster with min ACU = 0.5 and auto-pause enabled. Monday morning report job: 47-second cold start; downstream Step Functions timed out. Detection: State machine
States.Timeoutalarm. Fix: min ACU 2, auto-pause disabled for tier-1; tier-2 dev cluster kept auto-pause. Lesson: Serverless v2 policy is per-cluster — not one global setting.
Aurora Limitless — when to escalate
Use Limitless when (architecture docs):
- Table is very large or grows faster than single-instance headroom
- Workload has a natural shard key (tenant_id, region_id)
- Collocated joins on shard key are common
When NOT Limitless: < 1 TB OLTP, ad hoc cross-shard joins, or team without shard-key discipline — stay on standard Aurora.
RDS Blue/Green — major upgrade playbook
- Create Blue/Green deployment from production (blue)
- Patch/upgrade green staging
- Load test green with read-only or canary traffic where supported
- Switchover — writes blocked until sync complete
- Keep blue for rollback window per your runbook
Schedule using patching-window-worksheet.csv — score patch_risk_score before picking Sunday 04:00 UTC for tier-0.
MSP SOW — normalize before signing
| SOW line | Ask |
|---|---|
| Major version upgrades | Blue/Green included? Rollback owner? |
| Aurora Limitless | Shard design in scope? |
| Restore testing | Quarterly proof or ticket-only? |
| Sev1 response | Minutes, not “best effort” |
| ElastiCache / Redis | Same SLA as RDS? Often excluded |
Fill ops-scope-raci.md and attach to procurement.
What to Do This Week
- Inventory every production DB — fill patching-window-worksheet.csv.
- Mark tier-0 instances — define who pages at 3 a.m. (internal vs MSP).
- Review Aurora Serverless v2 min/max ACU — disable auto-pause on latency-sensitive tiers.
- Schedule one PITR restore drill to a scratch instance (not just snapshot existence).
- If MSP quote in flight, map SOW lines to RACI R/A columns.
Reproduce this — Download patching-window-worksheet.csv. Set
preferred_operatorper row. Count tier-0 rows withoutmsporcustomer_dbaaccountable — that gap is your on-call risk.
What This Post Doesn’t Cover
- Engine selection (RDS vs Aurora vs DynamoDB) — RDS vs Aurora guide
- Database migration cutover (DMS) — migration service engagement
- Full MSP RFP process — evaluate AWS MSP
- NoSQL / DynamoDB ops model — different automation surface
We have not benchmarked Aurora I/O-Optimized vs standard storage for every worksheet row — add your I/O profile before storage mode changes.
Related: RDS consulting · Managed services
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.




