How to Build a Cost-Optimized SaaS Stack on AWS (End-to-End Reference)
Quick summary: A B2B SaaS stack that costs $500/month at launch does not need to cost $50,000/month at 100,000 users if the architecture decisions at each stage are deliberate. This is the end-to-end reference architecture with real cost numbers.
Key Takeaways
- A B2B SaaS stack that costs $500/month at launch does not need to cost $50,000/month at 100,000 users if the architecture decisions at each stage are deliberate
- A B2B SaaS stack that costs $500/month at launch does not need to cost $50,000/month at 100,000 users if the architecture decisions at each stage are deliberate
Table of Contents
Most SaaS cost problems are architecture problems in disguise. The team that spends $50,000/month serving 100,000 users often made the same choices as the team spending $8,000/month at the same scale — they just never revisited those choices as traffic grew. The database is still sized for peak traffic instead of scaled dynamically. The caching layer was never added because “we’ll do it later.” Every job runs on dedicated always-on workers instead of consuming from a queue.
This post is a reference architecture for a B2B SaaS on AWS that is designed to be cost-efficient at launch, at growth, and at scale. It includes real pricing numbers at each stage, the Terraform patterns to implement it, and the specific inflection points where you need to revisit each component.
The Reference Architecture
The stack is deliberately conventional. Every component is a managed AWS service with predictable pricing. The architecture is:
Internet
└── Route 53 (DNS)
└── CloudFront (CDN + edge caching)
└── ALB (Application Load Balancer)
├── ECS Fargate (API service, capacity provider: 80% Spot / 20% On-Demand)
│ └── SQS (async job queue)
│ └── ECS Fargate (worker service, capacity provider: 100% Spot)
└── S3 (static assets, user uploads)
Data tier:
├── Aurora PostgreSQL (primary data store)
├── ElastiCache Valkey (caching, sessions, rate limiting)
└── S3 (object storage, exports, attachments)
Supporting:
├── Secrets Manager (credentials)
├── CloudWatch (logs, metrics, alarms)
└── AWS WAF (attached to CloudFront)
The API service runs on Fargate with a capacity provider strategy that mixes Spot and On-Demand. The worker service runs 100% Spot — background jobs are interruptible by design. This single decision reduces compute costs by 50–70% compared to pure On-Demand Fargate.
Stage 1: Launch ($0 → 100 users, ~$500/month)
At launch, you do not have enough traffic to justify anything more than the minimum viable configuration. The goal is a stack you can actually run on a startup budget.
| Component | Configuration | Monthly Cost |
|---|---|---|
| ALB | 1 ALB, minimal traffic | $22 |
| ECS Fargate API | 1 task (0.5 vCPU, 1 GB), On-Demand | $18 |
| ECS Fargate Worker | 1 task (0.25 vCPU, 0.5 GB), Spot | $4 |
| Aurora Serverless v2 | 0.5–2 ACU min/max, single AZ | $22–65 |
| ElastiCache Valkey | cache.t4g.micro | $12 |
| SQS | Standard queue, minimal usage | <$1 |
| CloudFront | Free tier (1 TB/month free first year) | $0 |
| S3 | 10 GB storage | $0.23 |
| Route 53 | Hosted zone + 1M queries | $0.90 |
| CloudWatch | Basic metrics + logs | $5 |
| Secrets Manager | 5 secrets | $2.50 |
| Total | ~$90–160/month |
Wait — that is not $500/month. The gap between $90–160 and $500 is two things: the free tier expiring and Multi-AZ. Once the 12-month free tier ends for EC2, RDS, and ElastiCache, and once you add Multi-AZ for Aurora (required before you charge real customers), the bill climbs. Add a staging environment at 50% of production cost: +$45–80/month. Total realistic launch cost: $250–350/month with Multi-AZ and staging.
The $500/month budget is deliberate. It gives headroom for CloudWatch alarms, WAF basic ruleset ($5/month for the managed rules), and the email service (SES: $0.10/1,000 emails + $0.12/GB attachments).
Aurora Serverless v2 at this stage: the 0.5 ACU minimum ($0.06/hour = $43/month) scales up automatically for batch imports, migrations, and report generation, then scales back down during idle periods. At 100 users with intermittent activity, Serverless v2 is the correct choice — you pay for what you use.
Stage 2: Growth (100 → 10,000 users, ~$2,000–5,000/month)
The growth stage is where architecture decisions either compound in your favor or against you. Teams that did not build queue-based job processing, did not add caching, or over-provisioned everything “to be safe” find their costs growing faster than their user count.
| Component | Configuration | Monthly Cost |
|---|---|---|
| ALB | 1 ALB, ~10M requests/month | $35 |
| ECS Fargate API | 3–5 tasks (1 vCPU, 2 GB), 80/20 Spot mix | $95–120 |
| ECS Fargate Workers | 2–4 tasks (0.5 vCPU, 1 GB), 100% Spot | $20–35 |
| Aurora PostgreSQL | db.r6g.large, Multi-AZ, provisioned | $390 |
| ElastiCache Valkey | cache.r6g.large, 2 nodes | $366 |
| SQS | Standard queues, ~100M messages/month | $4 |
| CloudFront | ~5 TB/month transfer | $430 |
| S3 | 500 GB storage | $11.50 |
| NAT Gateway | ~200 GB/month | $56 |
| CloudWatch + X-Ray | Detailed monitoring | $30 |
| WAF | Managed ruleset + rate limiting | $25 |
| Total | ~$1,060–1,100/month |
The Aurora crossover at this stage: by the time you have 10,000 users with regular usage, your database is likely running 2+ ACU sustained during business hours. A provisioned db.r6g.large ($0.26/hour = $187/month per node, $390/month for Multi-AZ) becomes cheaper than Aurora Serverless v2 at the same sustained load (4 ACU = $0.48/hour = $346/month). Migrate to provisioned at this stage.
ElastiCache Valkey: at the growth stage, cache sizing matters more than cache technology. Valkey (the open-source Redis fork that replaced Redis OSS in ElastiCache) is available in the same instance classes. A cache.r6g.large (13 GB) with 2 nodes ($183/month each) provides enough cache space for session data, API response caching, and rate limiting keys for 10,000 users.
The CloudFront cost surprise: serving 5 TB/month through CloudFront costs more than the entire compute tier at the growth stage. This is correct — it should be. CloudFront is offloading that traffic from your ALB and origin. Without CloudFront, you would pay the same $430 in data transfer costs from the ALB directly, plus origin compute costs for cache misses.
Stage 3: Scale (10,000 → 100,000+ users, ~$15,000–25,000/month)
At scale, the cost structure shifts. Compute is no longer the dominant cost — data transfer, database, and caching are. Teams that have not implemented read replicas, aggressive caching, and connection pooling find their database costs dominating the bill.
| Component | Configuration | Monthly Cost |
|---|---|---|
| ALB | 2 ALBs (API + internal), high traffic | $120 |
| ECS Fargate API | 15–25 tasks (2 vCPU, 4 GB), 80/20 Spot mix | $550–750 |
| ECS Fargate Workers | 10–15 tasks, 100% Spot | $120–180 |
| Aurora PostgreSQL | db.r7g.xlarge writer + 2 readers, Multi-AZ | $1,350 |
| RDS Proxy | 2 proxy endpoints | $85 |
| ElastiCache Valkey | cache.r7g.xlarge, 3-node cluster | $850 |
| SQS + SNS | High volume messaging | $25 |
| CloudFront | ~50 TB/month | $3,500 |
| S3 | 5 TB storage + high request volume | $175 |
| NAT Gateway | ~2 TB/month | $135 |
| CloudWatch + observability | Full observability stack | $150 |
| WAF + Shield Standard | Enterprise protection | $100 |
| Compute Savings Plans (1-year) | 30% discount on Fargate | -$200 |
| Total | ~$7,000–8,000/month |
The $7,000–8,000/month figure for 100,000 users — achieving $50,000/month at this scale is the result of not making architectural investments at the growth stage. The key decisions that keep costs in this range:
-
RDS Proxy (
$85/month) prevents connection pool exhaustion as task count grows. Without it, Fargate tasks at this scale open hundreds of direct database connections, degrading Aurora performance and requiring a larger instance class. -
Two Aurora read replicas shift the read/write ratio. Most SaaS applications are 80% reads. Directing reads to replicas allows the writer to handle only writes, sustaining a smaller writer instance class.
-
Compute Savings Plans for a 1-year commitment covers predictable Fargate baseline usage at a 17–30% discount. Apply Savings Plans at the management account level to cover all member accounts.
-
Aggressive CloudFront caching is the highest-leverage cost optimization at this stage. Cache TTL increases from 60 seconds to 5 minutes for API responses that tolerate slight staleness reduces origin traffic by 60–80%.
ECS Capacity Provider Strategy (Terraform)
The 80/20 Spot/On-Demand split for the API service:
resource "aws_ecs_cluster" "main" {
name = var.cluster_name
setting {
name = "containerInsights"
value = "enabled"
}
tags = var.tags
}
resource "aws_ecs_cluster_capacity_providers" "main" {
cluster_name = aws_ecs_cluster.main.name
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
default_capacity_provider_strategy {
base = 1 # Always keep 1 On-Demand task as baseline
weight = 20 # 20% of additional tasks are On-Demand
capacity_provider = "FARGATE"
}
default_capacity_provider_strategy {
base = 0
weight = 80 # 80% of additional tasks are Spot
capacity_provider = "FARGATE_SPOT"
}
}
resource "aws_ecs_service" "api" {
name = "${var.app_name}-api"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.api.arn
desired_count = var.api_task_count
capacity_provider_strategy {
capacity_provider = "FARGATE"
base = 1
weight = 20
}
capacity_provider_strategy {
capacity_provider = "FARGATE_SPOT"
base = 0
weight = 80
}
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.api.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.api.arn
container_name = "api"
container_port = var.container_port
}
# Critical for graceful Spot interruption handling
deployment_configuration {
maximum_percent = 200
minimum_healthy_percent = 100
deployment_circuit_breaker {
enable = true
rollback = true
}
}
lifecycle {
ignore_changes = [desired_count] # Allow autoscaling to manage count
}
tags = var.tags
}
# Worker service: 100% Spot — background jobs are interruptible
resource "aws_ecs_service" "worker" {
name = "${var.app_name}-worker"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.worker.arn
desired_count = var.worker_task_count
capacity_provider_strategy {
capacity_provider = "FARGATE_SPOT"
base = 0
weight = 100
}
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.worker.id]
assign_public_ip = false
}
tags = var.tags
}
The base = 1 on the On-Demand provider ensures there is always one non-Spot API task running. This prevents the situation where all tasks are Spot and a capacity event briefly takes the service offline. The remaining tasks fill in at 80% Spot.
Terraform Module Structure
A well-organized module structure keeps the reference architecture maintainable as it grows:
terraform/
├── main.tf # Root module: calls all child modules
├── variables.tf # Root variables
├── outputs.tf # Root outputs (ALB DNS, DB endpoint, etc.)
├── providers.tf # AWS provider configuration
├── backend.tf # S3 + DynamoDB backend
│
└── modules/
├── networking/ # VPC, subnets, NAT, security groups
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── database/ # Aurora cluster, parameter groups, proxy
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── cache/ # ElastiCache Valkey cluster
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── ecs/ # Cluster, services, task definitions, autoscaling
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── alb/ # Load balancer, target groups, listeners
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── storage/ # S3 buckets, CloudFront distribution
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── messaging/ # SQS queues, SNS topics, DLQs
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
└── observability/ # CloudWatch dashboards, alarms, X-Ray
├── main.tf
├── variables.tf
└── outputs.tf
Each module is independently versioned. The networking module is the most stable (rarely changes). The ECS module changes most frequently (task definitions, autoscaling policies). Separating them prevents a change to task definitions from requiring a networking plan.
Cost Allocation Tags with aws_default_tags
The single most impactful cost visibility decision: configure default tags at the provider level so every resource is tagged consistently:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = var.project_name
Environment = var.environment # production, staging, development
ManagedBy = "terraform"
CostCenter = var.cost_center # engineering, data, infrastructure
Team = var.team_name
Repository = var.github_repository
}
}
}
With default_tags, every resource created by Terraform in that provider block inherits these tags automatically. You do not need to add tags = var.tags to every resource (though you can override specific resources).
Enable these tags as cost allocation tags in the AWS Billing console immediately. Cost allocation tags can only report on cost data going forward from when they are activated — you cannot retroactively tag historical cost data. Activate on day one.
Per-Tenant Cost Isolation in Multi-Tenant SaaS
A shared infrastructure multi-tenant SaaS (all tenants share the same RDS cluster, ECS services, and ElastiCache) cannot achieve per-tenant cost isolation through AWS tags alone. AWS tags apply to resources, not to the requests those resources serve. You need application-level attribution.
Database attribution: Enable RDS Performance Insights. Structure application database connections to include the tenant identifier in the application_name connection parameter:
# SQLAlchemy example
engine = create_engine(
database_url,
connect_args={
"application_name": f"saas-app-tenant-{tenant_id}"
}
)
Query pg_stat_statements grouped by application_name to see query count and total execution time per tenant. Multiply by the hourly Aurora cost divided by total execution time to get an attribution percentage.
Compute attribution: Add tenant ID to ECS task metadata when using task-per-tenant isolation (silo model). For shared-pool models, log tenant ID with every request, then join CloudWatch Logs with ALB access logs to attribute request processing time per tenant.
Noisy tenant detection: A single large tenant running bulk imports, generating large exports, or running inefficient queries can consume 40–60% of database resources while paying for a plan priced at the median. Detect this via RDS Performance Insights: set an alert when a single application_name consumes more than 30% of total query time over a 30-minute window.
Enforcement options: application-level rate limiting (limit DB query rate per tenant), job queue priority throttling (large tenants’ background jobs queue behind other tenants’ jobs), or a direct conversation with the tenant about their usage pattern.
Scaling from 0 to 1M: The Decision Points
| Scale | Key Decision | Cost Impact |
|---|---|---|
| 0 → 100 users | Aurora Serverless v2 | Pay for actual compute, not provisioned |
| 100 → 1K users | Add ElastiCache, use SQS for jobs | Cache hit rate 80%+ reduces DB load |
| 1K → 10K users | Move to provisioned Aurora, enable read replicas | Serverless v2 crossover point |
| 10K → 100K users | Add RDS Proxy, Compute Savings Plans | Connection pooling + 17–30% compute savings |
| 100K → 1M users | Consider Aurora Global Database for reads, evaluate DynamoDB for high-throughput data | Aurora write throughput limits |
The 1M-user inflection: Aurora MySQL/PostgreSQL on a single writer handles approximately 100,000–200,000 transactions per minute on db.r7g.4xlarge. At 1M active users with real-time behavior, you may need to partition hot data into DynamoDB (session state, real-time presence, counters) while keeping relational data in Aurora.
Edge Cases
Free Tier Exhaustion
The AWS free tier expires after 12 months. For SaaS startups, this creates a billing cliff: month 13’s bill is typically 30–40% higher than month 12. This surprises founders who have been watching a static bill. Track free tier usage in Cost Explorer’s “Free Tier” tab and plan the transition before it happens.
Uneven AZ Load
ALB distributes requests across registered targets in all configured AZs. If your Fargate tasks are unevenly distributed across AZs (more tasks in us-east-1a than us-east-1b due to Spot capacity availability), the ALB still routes evenly. This means tasks in the less-populated AZ receive more requests per task. Monitor target response time by AZ in CloudWatch to detect this pattern.
Single Large Tenant Cost Spike
A tenant that exports their entire database, runs a bulk import of 1M records, or integrates a third-party tool that polls your API every second can spike your costs by 200–300% in a single day. SQS with visibility timeout and max receive count provides natural throttling for background job workloads. For API traffic, WAF rate limiting per IP or per API key ($1/month/rule in WAF) prevents runaway integrations.
For container orchestration decision-making between ECS and EKS, see our guide on AWS ECS vs EKS. For the caching strategies that maximize your ElastiCache investment, see AWS ElastiCache Redis caching strategies. For the database selection decisions in this stack, see AWS RDS vs Aurora. For a comprehensive FinOps framework to govern costs as you scale, see FinOps on AWS: Complete Guide to Cloud Cost Governance.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.