Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

For a fintech SaaS (6 prod databases, ~$14k/mo RDS line), scoping in-house DBA for schema + MSP for 24/7 paging cut on-call from 22 to 9 hrs/mo — Aurora Serverless v2 min ACU tuning handled tier-2 first.

Key Facts

  • Aurora Serverless v2 platform version v3 supports scaling from 0 to 256 ACUs (each ACU ≈ 2 GiB memory plus proportional CPU/network) with 0
  • 5 ACU increments (AWS What's New — Serverless v2 performance)
  • RDS Blue/Green Deployments block writes on both blue and green until replication catches up, then promote green with zero data loss design (RDS FAQ)
  • This post is the managed database operations buyer guide — who runs what on RDS/Aurora, when AWS-managed automation suffices, and when a partner MSP earns its fee
  • It is not RDS vs Aurora engine selection, not Postgres cost tuning alone, and not generic MSP scope

Entity Definitions

RDS
RDS is an AWS service discussed in this article.
Aurora
Aurora is an AWS service discussed in this article.
DynamoDB
DynamoDB is an AWS service discussed in this article.
CloudWatch
CloudWatch is an AWS service discussed in this article.
Step Functions
Step Functions is an AWS service discussed in this article.
Secrets Manager
Secrets Manager is an AWS service discussed in this article.
ElastiCache
ElastiCache is an AWS service discussed in this article.
serverless
serverless is a cloud computing concept discussed in this article.

Managed Database Operations on AWS (2026): RDS/Aurora Buyer Guide for In-House vs MSP

Cloud ArchitecturePalaniappan P4 min read

Quick summary: For a fintech SaaS (6 prod databases, ~$14k/mo RDS line), scoping in-house DBA for schema + MSP for 24/7 paging cut on-call from 22 to 9 hrs/mo — Aurora Serverless v2 min ACU tuning handled tier-2 first.

Key Takeaways

  • Aurora Serverless v2 platform version v3 supports scaling from 0 to 256 ACUs (each ACU ≈ 2 GiB memory plus proportional CPU/network) with 0
  • 5 ACU increments (AWS What's New — Serverless v2 performance)
  • RDS Blue/Green Deployments block writes on both blue and green until replication catches up, then promote green with zero data loss design (RDS FAQ)
  • This post is the managed database operations buyer guide — who runs what on RDS/Aurora, when AWS-managed automation suffices, and when a partner MSP earns its fee
  • It is not RDS vs Aurora engine selection, not Postgres cost tuning alone, and not generic MSP scope
Managed Database Operations on AWS (2026): RDS/Aurora Buyer Guide for In-House vs MSP
Table of Contents

Aurora Serverless v2 platform version v3 supports scaling from 0 to 256 ACUs (each ACU ≈ 2 GiB memory plus proportional CPU/network) with 0.5 ACU increments (AWS What’s New — Serverless v2 performance). RDS Blue/Green Deployments block writes on both blue and green until replication catches up, then promote green with zero data loss design (RDS FAQ). Aurora PostgreSQL Limitless adds sharded, reference, and standard table types for OLTP scale-out (Limitless architecture docs).

This post is the managed database operations buyer guide — who runs what on RDS/Aurora, when AWS-managed automation suffices, and when a partner MSP earns its fee. It is not RDS vs Aurora engine selection, not Postgres cost tuning alone, and not generic MSP scope.

Artifacts: ops scope RACI, patching window worksheet CSV.

Benchmark pattern (not a cited client)Fintech SaaS, 6 production databases (3 Aurora PostgreSQL, 2 RDS MySQL, 1 ElastiCache Redis), ~$14k/mo RDS/Aurora line, ~22 hrs/mo internal on-call before scope definition. After RACI split (in-house DBA = schema + Performance Insights; MSP = 24/7 Sev1 paging + quarterly restore drill): ~9 hrs/mo internal on-call; Serverless v2 min ACU tuning on tier-2 saved ~$380/mo on analytics cluster.

Three operating models

ModelYou operateAWS operatesBest when
AWS-managed RDS/AuroraSchema, queries, capacity settingsEngine patch, Multi-AZ failover, automated backupDefault for most teams
In-house DBA + AWSTuning, migrations, on-call (business hours)Infrastructure layer above3–8 instances, strong platform team
Partner MSP overlayApp schema ownership (sometimes)Per SOW: paging, upgrades, DR drills24/7 tier-0 you cannot staff

Opinionated take: AWS already “manages” the database engine — paying an MSP to watch automated patching is weak ROI unless they bring Blue/Green major upgrades, restore testing, and app-aware runbooks.

AWS-managed primitives — know what you already bought

FeatureOps valueYou still own
Automated backups + PITRSnapshot scheduleRestore drill, RPO proof
Multi-AZ / Aurora storageFailoverConnection retry, RDS Proxy
Performance InsightsWait event visibilitySQL/index remediation
RDS ProxyPooling, faster failoverAuth via Secrets Manager
Blue/Green DeploymentsMajor version stagingSwitchover window + app retry
Aurora Serverless v2ACU autoscaleMin/max ACU policy
Aurora LimitlessShard routingShard key + colocation design

Aurora Serverless v2 — capacity policy

Monitor ServerlessDatabaseCapacity and ACUUtilization in CloudWatch (Serverless v2 scaling guide):

  • Set min ACU to hold working set in buffer pool for tier-0
  • Set max ACU high enough for spike headroom (256 ACU ceiling on v3)
  • Auto-pause to 0 ACU — dev/test only unless cold-start acceptable

What broke — Production analytics cluster with min ACU = 0.5 and auto-pause enabled. Monday morning report job: 47-second cold start; downstream Step Functions timed out. Detection: State machine States.Timeout alarm. Fix: min ACU 2, auto-pause disabled for tier-1; tier-2 dev cluster kept auto-pause. Lesson: Serverless v2 policy is per-cluster — not one global setting.

Aurora Limitless — when to escalate

Use Limitless when (architecture docs):

  • Table is very large or grows faster than single-instance headroom
  • Workload has a natural shard key (tenant_id, region_id)
  • Collocated joins on shard key are common

When NOT Limitless: < 1 TB OLTP, ad hoc cross-shard joins, or team without shard-key discipline — stay on standard Aurora.

RDS Blue/Green — major upgrade playbook

  1. Create Blue/Green deployment from production (blue)
  2. Patch/upgrade green staging
  3. Load test green with read-only or canary traffic where supported
  4. Switchover — writes blocked until sync complete
  5. Keep blue for rollback window per your runbook

Schedule using patching-window-worksheet.csv — score patch_risk_score before picking Sunday 04:00 UTC for tier-0.

MSP SOW — normalize before signing

SOW lineAsk
Major version upgradesBlue/Green included? Rollback owner?
Aurora LimitlessShard design in scope?
Restore testingQuarterly proof or ticket-only?
Sev1 responseMinutes, not “best effort”
ElastiCache / RedisSame SLA as RDS? Often excluded

Fill ops-scope-raci.md and attach to procurement.

What to Do This Week

  1. Inventory every production DB — fill patching-window-worksheet.csv.
  2. Mark tier-0 instances — define who pages at 3 a.m. (internal vs MSP).
  3. Review Aurora Serverless v2 min/max ACU — disable auto-pause on latency-sensitive tiers.
  4. Schedule one PITR restore drill to a scratch instance (not just snapshot existence).
  5. If MSP quote in flight, map SOW lines to RACI R/A columns.

Reproduce this — Download patching-window-worksheet.csv. Set preferred_operator per row. Count tier-0 rows without msp or customer_dba accountable — that gap is your on-call risk.

What This Post Doesn’t Cover

  • Engine selection (RDS vs Aurora vs DynamoDB)RDS vs Aurora guide
  • Database migration cutover (DMS) — migration service engagement
  • Full MSP RFP processevaluate AWS MSP
  • NoSQL / DynamoDB ops model — different automation surface

We have not benchmarked Aurora I/O-Optimized vs standard storage for every worksheet row — add your I/O profile before storage mode changes.

Related: RDS consulting · Managed services

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Recommended Reading

Explore All Articles »