At what scale does Aurora Serverless v2 become more expensive than provisioned Aurora?

Aurora Serverless v2 charges $0.12/ACU-hour, where 1 ACU ≈ 2 GB RAM and proportional compute. A minimum configuration of 0.5 ACU (minimum scale) costs $0.06/hour = $43/month idle. A provisioned db.t4g.medium (2 vCPU, 4 GB) costs $0.065/hour = $47/month. At low utilization, Aurora Serverless v2 and provisioned t4g.medium are cost-equivalent. At sustained load where Serverless scales to 4+ ACU ($0.48/hour = $346/month), a provisioned db.r7g.large (2 vCPU, 16 GB, $0.26/hour = $187/month) is cheaper. The crossover: when your workload requires >2 ACU sustained for more than 50% of the day, provisioned instances are cheaper. Aurora Serverless v2 is cost-optimal for workloads with deep valleys (nights, weekends) and moderate peaks.

How do you implement per-tenant cost isolation with AWS Cost Explorer?

Per-tenant cost isolation requires cost allocation tags applied consistently to all resources serving each tenant. Tag strategy: apply a tenant tag at the resource level (ECS task → tags propagated via ECS tag propagation, RDS → tagged in Terraform, S3 objects → cannot tag individual objects without significant overhead). For multi-tenant shared infrastructure (single RDS cluster serving all tenants), per-tenant cost isolation requires application-level attribution: measure queries per tenant using pg_stat_statements or RDS Performance Insights grouped by application name, measure compute usage per tenant via ECS task-level metrics with tenant ID in the task definition family name, and synthesize cost allocation in a BI tool using proportional attribution. Cost Explorer can filter by tag for tenant-tagged resources; application-level attribution requires custom tooling.

What is the Fargate Spot interruption rate and how do you handle it in production?

AWS does not publish Fargate Spot interruption rates, but empirically they are lower than EC2 Spot interruption rates (typically <5% per hour for most capacity pools). Fargate Spot tasks receive a SIGTERM 30 seconds before termination. For stateless web services: 30 seconds is sufficient to drain in-flight requests if your ALB deregistration delay is set to 30 seconds and your application handles SIGTERM by stopping accepting new connections and completing current requests. For queue workers: the worker should catch SIGTERM and stop polling for new messages, complete the current message if within its time budget, and exit cleanly. The 80/20 Fargate On-Demand/Spot split ensures that interruptions do not affect service availability — Spot tasks are replaced by On-Demand tasks automatically within 1–3 minutes.

How should a SaaS startup structure its AWS account for cost visibility from day one?

Use AWS Organizations with separate accounts for: production, staging/development, and shared services (centralized logging, ECR registry, Route 53). Cost visibility: enable AWS Cost Explorer in the management account with cross-account aggregation, activate cost allocation tags immediately (environment, service, team), and set up monthly budget alerts at 80% and 100% of expected spend. Resist the temptation to run everything in one account — account-level separation provides billing isolation (staging costs cannot obscure production costs), security blast radius reduction, and enables per-environment committed spend (Savings Plans per account). Cost optimization: all Savings Plans and Reserved Instances purchased in the management account apply across all member accounts via Consolidated Billing.

Cost-Optimized SaaS Stack on AWS: End-to-End Reference Architecture

Q: How should a SaaS startup structure its AWS account for cost visibility from day one?

Use AWS Organizations with separate accounts for: production, staging/development, and shared services (centralized logging, ECR registry, Route 53). Cost visibility: enable AWS Cost Explorer in the management account with cross-account aggregation, activate cost allocation tags immediately (environment, service, team), and set up monthly budget alerts at 80% and 100% of expected spend. Resist the temptation to run everything in one account — account-level separation provides billing isolation (staging costs cannot obscure production costs), security blast radius reduction, and enables per-environment committed spend (Savings Plans per account). Cost optimization: all Savings Plans and Reserved Instances purchased in the management account apply across all member accounts via Consolidated Billing.

Most SaaS cost problems are architecture problems in disguise. The team that spends $50,000/month serving 100,000 users often made the same choices as the team spending $8,000/month at the same scale — they just never revisited those choices as traffic grew. The database is still sized for peak traffic instead of scaled dynamically. The caching layer was never added because “we’ll do it later.” Every job runs on dedicated always-on workers instead of consuming from a queue.

This post is a reference architecture for a B2B SaaS on AWS that is designed to be cost-efficient at launch, at growth, and at scale. It includes real pricing numbers at each stage, the Terraform patterns to implement it, and the specific inflection points where you need to revisit each component.

The Reference Architecture

The stack is deliberately conventional. Every component is a managed AWS service with predictable pricing. The architecture is:

Internet
  └── Route 53 (DNS)
        └── CloudFront (CDN + edge caching)
              └── ALB (Application Load Balancer)
                    ├── ECS Fargate (API service, capacity provider: 80% Spot / 20% On-Demand)
                    │     └── SQS (async job queue)
                    │           └── ECS Fargate (worker service, capacity provider: 100% Spot)
                    └── S3 (static assets, user uploads)

Data tier:
  ├── Aurora PostgreSQL (primary data store)
  ├── ElastiCache Valkey (caching, sessions, rate limiting)
  └── S3 (object storage, exports, attachments)

Supporting:
  ├── Secrets Manager (credentials)
  ├── CloudWatch (logs, metrics, alarms)
  └── AWS WAF (attached to CloudFront)

The API service runs on Fargate with a capacity provider strategy that mixes Spot and On-Demand. The worker service runs 100% Spot — background jobs are interruptible by design. This single decision reduces compute costs by 50–70% compared to pure On-Demand Fargate.

Stage 1: Launch ($0 → 100 users, ~$500/month)

At launch, you do not have enough traffic to justify anything more than the minimum viable configuration. The goal is a stack you can actually run on a startup budget.

Component	Configuration	Monthly Cost
ALB	1 ALB, minimal traffic	$22
ECS Fargate API	1 task (0.5 vCPU, 1 GB), On-Demand	$18
ECS Fargate Worker	1 task (0.25 vCPU, 0.5 GB), Spot	$4
Aurora Serverless v2	0.5–2 ACU min/max, single AZ	$22–65
ElastiCache Valkey	cache.t4g.micro	$12
SQS	Standard queue, minimal usage	<$1
CloudFront	Free tier (1 TB/month free first year)	$0
S3	10 GB storage	$0.23
Route 53	Hosted zone + 1M queries	$0.90
CloudWatch	Basic metrics + logs	$5
Secrets Manager	5 secrets	$2.50
Total		~$90–160/month

Wait — that is not $500/month. The gap between $90–160 and $500 is two things: the free tier expiring and Multi-AZ. Once the 12-month free tier ends for EC2, RDS, and ElastiCache, and once you add Multi-AZ for Aurora (required before you charge real customers), the bill climbs. Add a staging environment at 50% of production cost: +$45–80/month. Total realistic launch cost: $250–350/month with Multi-AZ and staging.

The $500/month budget is deliberate. It gives headroom for CloudWatch alarms, WAF basic ruleset ($5/month for the managed rules), and the email service (SES: $0.10/1,000 emails + $0.12/GB attachments).

Aurora Serverless v2 at this stage: the 0.5 ACU minimum ($0.06/hour = $43/month) scales up automatically for batch imports, migrations, and report generation, then scales back down during idle periods. At 100 users with intermittent activity, Serverless v2 is the correct choice — you pay for what you use.

Stage 2: Growth (100 → 10,000 users, ~$2,000–5,000/month)

The growth stage is where architecture decisions either compound in your favor or against you. Teams that did not build queue-based job processing, did not add caching, or over-provisioned everything “to be safe” find their costs growing faster than their user count.

Component	Configuration	Monthly Cost
ALB	1 ALB, ~10M requests/month	$35
ECS Fargate API	3–5 tasks (1 vCPU, 2 GB), 80/20 Spot mix	$95–120
ECS Fargate Workers	2–4 tasks (0.5 vCPU, 1 GB), 100% Spot	$20–35
Aurora PostgreSQL	db.r6g.large, Multi-AZ, provisioned	$390
ElastiCache Valkey	cache.r6g.large, 2 nodes	$366
SQS	Standard queues, ~100M messages/month	$4
CloudFront	~5 TB/month transfer	$430
S3	500 GB storage	$11.50
NAT Gateway	~200 GB/month	$56
CloudWatch + X-Ray	Detailed monitoring	$30
WAF	Managed ruleset + rate limiting	$25
Total		~$1,060–1,100/month

The Aurora crossover at this stage: by the time you have 10,000 users with regular usage, your database is likely running 2+ ACU sustained during business hours. A provisioned db.r6g.large ($0.26/hour = $187/month per node, $390/month for Multi-AZ) becomes cheaper than Aurora Serverless v2 at the same sustained load (4 ACU = $0.48/hour = $346/month). Migrate to provisioned at this stage.

ElastiCache Valkey: at the growth stage, cache sizing matters more than cache technology. Valkey (the open-source Redis fork that replaced Redis OSS in ElastiCache) is available in the same instance classes. A cache.r6g.large (13 GB) with 2 nodes ($183/month each) provides enough cache space for session data, API response caching, and rate limiting keys for 10,000 users.

The CloudFront cost surprise: serving 5 TB/month through CloudFront costs more than the entire compute tier at the growth stage. This is correct — it should be. CloudFront is offloading that traffic from your ALB and origin. Without CloudFront, you would pay the same $430 in data transfer costs from the ALB directly, plus origin compute costs for cache misses.

Stage 3: Scale (10,000 → 100,000+ users, ~$15,000–25,000/month)

At scale, the cost structure shifts. Compute is no longer the dominant cost — data transfer, database, and caching are. Teams that have not implemented read replicas, aggressive caching, and connection pooling find their database costs dominating the bill.

Component	Configuration	Monthly Cost
ALB	2 ALBs (API + internal), high traffic	$120
ECS Fargate API	15–25 tasks (2 vCPU, 4 GB), 80/20 Spot mix	$550–750
ECS Fargate Workers	10–15 tasks, 100% Spot	$120–180
Aurora PostgreSQL	db.r7g.xlarge writer + 2 readers, Multi-AZ	$1,350
RDS Proxy	2 proxy endpoints	$85
ElastiCache Valkey	cache.r7g.xlarge, 3-node cluster	$850
SQS + SNS	High volume messaging	$25
CloudFront	~50 TB/month	$3,500
S3	5 TB storage + high request volume	$175
NAT Gateway	~2 TB/month	$135
CloudWatch + observability	Full observability stack	$150
WAF + Shield Standard	Enterprise protection	$100
Compute Savings Plans (1-year)	30% discount on Fargate	-$200
Total		~$7,000–8,000/month

The $7,000–8,000/month figure for 100,000 users — achieving $50,000/month at this scale is the result of not making architectural investments at the growth stage. The key decisions that keep costs in this range:

RDS Proxy ($85/month) prevents connection pool exhaustion as task count grows. Without it, Fargate tasks at this scale open hundreds of direct database connections, degrading Aurora performance and requiring a larger instance class.
Two Aurora read replicas shift the read/write ratio. Most SaaS applications are 80% reads. Directing reads to replicas allows the writer to handle only writes, sustaining a smaller writer instance class.
Compute Savings Plans for a 1-year commitment covers predictable Fargate baseline usage at a 17–30% discount. Apply Savings Plans at the management account level to cover all member accounts.
Aggressive CloudFront caching is the highest-leverage cost optimization at this stage. Cache TTL increases from 60 seconds to 5 minutes for API responses that tolerate slight staleness reduces origin traffic by 60–80%.

ECS Capacity Provider Strategy (Terraform)

The 80/20 Spot/On-Demand split for the API service:

resource "aws_ecs_cluster" "main" {
  name = var.cluster_name

  setting {
    name  = "containerInsights"
    value = "enabled"
  }

  tags = var.tags
}

resource "aws_ecs_cluster_capacity_providers" "main" {
  cluster_name = aws_ecs_cluster.main.name

  capacity_providers = ["FARGATE", "FARGATE_SPOT"]

  default_capacity_provider_strategy {
    base              = 1           # Always keep 1 On-Demand task as baseline
    weight            = 20          # 20% of additional tasks are On-Demand
    capacity_provider = "FARGATE"
  }

  default_capacity_provider_strategy {
    base              = 0
    weight            = 80          # 80% of additional tasks are Spot
    capacity_provider = "FARGATE_SPOT"
  }
}

resource "aws_ecs_service" "api" {
  name            = "${var.app_name}-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api.arn
  desired_count   = var.api_task_count

  capacity_provider_strategy {
    capacity_provider = "FARGATE"
    base              = 1
    weight            = 20
  }

  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    base              = 0
    weight            = 80
  }

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.api.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.api.arn
    container_name   = "api"
    container_port   = var.container_port
  }

  # Critical for graceful Spot interruption handling
  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100

    deployment_circuit_breaker {
      enable   = true
      rollback = true
    }
  }

  lifecycle {
    ignore_changes = [desired_count]  # Allow autoscaling to manage count
  }

  tags = var.tags
}

# Worker service: 100% Spot — background jobs are interruptible
resource "aws_ecs_service" "worker" {
  name            = "${var.app_name}-worker"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.worker.arn
  desired_count   = var.worker_task_count

  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    base              = 0
    weight            = 100
  }

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.worker.id]
    assign_public_ip = false
  }

  tags = var.tags
}

The base = 1 on the On-Demand provider ensures there is always one non-Spot API task running. This prevents the situation where all tasks are Spot and a capacity event briefly takes the service offline. The remaining tasks fill in at 80% Spot.

Terraform Module Structure

A well-organized module structure keeps the reference architecture maintainable as it grows:

terraform/
├── main.tf                    # Root module: calls all child modules
├── variables.tf               # Root variables
├── outputs.tf                 # Root outputs (ALB DNS, DB endpoint, etc.)
├── providers.tf               # AWS provider configuration
├── backend.tf                 # S3 + DynamoDB backend
│
└── modules/
    ├── networking/            # VPC, subnets, NAT, security groups
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    │
    ├── database/              # Aurora cluster, parameter groups, proxy
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    │
    ├── cache/                 # ElastiCache Valkey cluster
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    │
    ├── ecs/                   # Cluster, services, task definitions, autoscaling
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    │
    ├── alb/                   # Load balancer, target groups, listeners
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    │
    ├── storage/               # S3 buckets, CloudFront distribution
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    │
    ├── messaging/             # SQS queues, SNS topics, DLQs
    │   ├── main.tf
    │   ├── variables.tf
    │   └── outputs.tf
    │
    └── observability/         # CloudWatch dashboards, alarms, X-Ray
        ├── main.tf
        ├── variables.tf
        └── outputs.tf

Each module is independently versioned. The networking module is the most stable (rarely changes). The ECS module changes most frequently (task definitions, autoscaling policies). Separating them prevents a change to task definitions from requiring a networking plan.

Cost Allocation Tags with aws_default_tags

The single most impactful cost visibility decision: configure default tags at the provider level so every resource is tagged consistently:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = var.project_name
      Environment = var.environment       # production, staging, development
      ManagedBy   = "terraform"
      CostCenter  = var.cost_center       # engineering, data, infrastructure
      Team        = var.team_name
      Repository  = var.github_repository
    }
  }
}

With default_tags, every resource created by Terraform in that provider block inherits these tags automatically. You do not need to add tags = var.tags to every resource (though you can override specific resources).

Enable these tags as cost allocation tags in the AWS Billing console immediately. Cost allocation tags can only report on cost data going forward from when they are activated — you cannot retroactively tag historical cost data. Activate on day one.

Per-Tenant Cost Isolation in Multi-Tenant SaaS

A shared infrastructure multi-tenant SaaS (all tenants share the same RDS cluster, ECS services, and ElastiCache) cannot achieve per-tenant cost isolation through AWS tags alone. AWS tags apply to resources, not to the requests those resources serve. You need application-level attribution.

Database attribution: Enable RDS Performance Insights. Structure application database connections to include the tenant identifier in the application_name connection parameter:

# SQLAlchemy example
engine = create_engine(
    database_url,
    connect_args={
        "application_name": f"saas-app-tenant-{tenant_id}"
    }
)

Query pg_stat_statements grouped by application_name to see query count and total execution time per tenant. Multiply by the hourly Aurora cost divided by total execution time to get an attribution percentage.

Compute attribution: Add tenant ID to ECS task metadata when using task-per-tenant isolation (silo model). For shared-pool models, log tenant ID with every request, then join CloudWatch Logs with ALB access logs to attribute request processing time per tenant.

Noisy tenant detection: A single large tenant running bulk imports, generating large exports, or running inefficient queries can consume 40–60% of database resources while paying for a plan priced at the median. Detect this via RDS Performance Insights: set an alert when a single application_name consumes more than 30% of total query time over a 30-minute window.

Enforcement options: application-level rate limiting (limit DB query rate per tenant), job queue priority throttling (large tenants’ background jobs queue behind other tenants’ jobs), or a direct conversation with the tenant about their usage pattern.

Scaling from 0 to 1M: The Decision Points

Scale	Key Decision	Cost Impact
0 → 100 users	Aurora Serverless v2	Pay for actual compute, not provisioned
100 → 1K users	Add ElastiCache, use SQS for jobs	Cache hit rate 80%+ reduces DB load
1K → 10K users	Move to provisioned Aurora, enable read replicas	Serverless v2 crossover point
10K → 100K users	Add RDS Proxy, Compute Savings Plans	Connection pooling + 17–30% compute savings
100K → 1M users	Consider Aurora Global Database for reads, evaluate DynamoDB for high-throughput data	Aurora write throughput limits

The 1M-user inflection: Aurora MySQL/PostgreSQL on a single writer handles approximately 100,000–200,000 transactions per minute on db.r7g.4xlarge. At 1M active users with real-time behavior, you may need to partition hot data into DynamoDB (session state, real-time presence, counters) while keeping relational data in Aurora.

Edge Cases

Free Tier Exhaustion

The AWS free tier expires after 12 months. For SaaS startups, this creates a billing cliff: month 13’s bill is typically 30–40% higher than month 12. This surprises founders who have been watching a static bill. Track free tier usage in Cost Explorer’s “Free Tier” tab and plan the transition before it happens.

Uneven AZ Load

ALB distributes requests across registered targets in all configured AZs. If your Fargate tasks are unevenly distributed across AZs (more tasks in us-east-1a than us-east-1b due to Spot capacity availability), the ALB still routes evenly. This means tasks in the less-populated AZ receive more requests per task. Monitor target response time by AZ in CloudWatch to detect this pattern.

Single Large Tenant Cost Spike

A tenant that exports their entire database, runs a bulk import of 1M records, or integrates a third-party tool that polls your API every second can spike your costs by 200–300% in a single day. SQS with visibility timeout and max receive count provides natural throttling for background job workloads. For API traffic, WAF rate limiting per IP or per API key ($1/month/rule in WAF) prevents runaway integrations.

For container orchestration decision-making between ECS and EKS, see our guide on AWS ECS vs EKS. For the caching strategies that maximize your ElastiCache investment, see AWS ElastiCache Redis caching strategies. For the database selection decisions in this stack, see AWS RDS vs Aurora. For a comprehensive FinOps framework to govern costs as you scale, see FinOps on AWS: Complete Guide to Cloud Cost Governance.

How to Build a Cost-Optimized SaaS Stack on AWS (End-to-End Reference)

The Reference Architecture

Stage 1: Launch ($0 → 100 users, ~$500/month)

Stage 2: Growth (100 → 10,000 users, ~$2,000–5,000/month)

Stage 3: Scale (10,000 → 100,000+ users, ~$15,000–25,000/month)

ECS Capacity Provider Strategy (Terraform)

Terraform Module Structure

Cost Allocation Tags with aws_default_tags

Per-Tenant Cost Isolation in Multi-Tenant SaaS

Scaling from 0 to 1M: The Decision Points

Edge Cases

Free Tier Exhaustion

Uneven AZ Load

Single Large Tenant Cost Spike

Ready to discuss your AWS strategy?

Recommended Reading

How to Build Hybrid Compute (EC2 + Serverless) for Cost Efficiency

How to Use Redis and Valkey as a Cost-Saving Layer (Not Just Cache)

How to Migrate to AWS Without Cost Surprises

How to Design Multi-Region AWS Architectures Without Doubling Costs

AI & assistant-friendly summary

Summary

Key Facts

Related Content

The Reference Architecture

Stage 1: Launch ($0 → 100 users, ~$500/month)

Stage 2: Growth (100 → 10,000 users, ~$2,000–5,000/month)

Stage 3: Scale (10,000 → 100,000+ users, ~$15,000–25,000/month)

ECS Capacity Provider Strategy (Terraform)

Terraform Module Structure

Cost Allocation Tags with aws_default_tags

Per-Tenant Cost Isolation in Multi-Tenant SaaS

Scaling from 0 to 1M: The Decision Points

Edge Cases

Free Tier Exhaustion

Uneven AZ Load

Single Large Tenant Cost Spike

Ready to discuss your AWS strategy?

Recommended Reading

How to Build Hybrid Compute (EC2 + Serverless) for Cost Efficiency

How to Use Redis and Valkey as a Cost-Saving Layer (Not Just Cache)

How to Migrate to AWS Without Cost Surprises

How to Design Multi-Region AWS Architectures Without Doubling Costs