AWS Bill Teardown #2: How a Healthcare Company Spent $200k/Year on NAT Gateways
A healthcare company was spending $200k/year on NAT Gateways due to VPC architecture decisions made in 2019. Here is how we rewired their network and cut costs by 78%.
A healthcare company was spending $200k/year on NAT Gateways due to VPC architecture decisions made in 2019. Here is how we rewired their network and cut costs by 78%.
Autoscaling was supposed to make costs predictable by matching capacity to demand. Instead, it introduced feedback loops, burst amplification, and — with AI workloads — a new class of non-deterministic spend that no scaling policy anticipates.
Observability is not free, and the industry has collectively underpriced it. CloudWatch log ingestion, metrics explosion, and X-Ray trace volume can together exceed your compute bill — especially once AI workloads introduce high-cardinality telemetry at scale.
Savings Plans and Reserved Instances reduce the rate you pay. Architecture determines the volume you pay at. The most durable cost reductions in AWS come from designing systems that structurally generate less spend — not from negotiating a lower price for the same behavior.
Most AWS cost forecasts miss by 30–50% not because engineers are careless, but because the forecasting model does not match how AWS actually charges. This is the playbook for getting forecasts right: which metrics to measure, which models to use, and where the structural gaps are.
The most expensive AWS architectures are not the ones that use the most resources — they are the ones whose costs respond unpredictably to inputs. This is the design discipline for building systems where costs are structurally bounded and forecasting is accurate.
Data transfer is the most consistently underestimated cost in AWS architectures. It does not appear in compute estimates, it does not scale linearly, and it punishes microservices designs at exactly the moment growth feels like success.
AWS surprise bills from autoscaling follow a small set of repeatable failure patterns: feedback loops, scale-out without scale-in, burst amplification from misconfigured metrics, and commitment mismatches after scaling events. Each pattern has a specific fix.
The reason AWS cost problems grow undetected is not technical — it is organizational. Engineers make architectural decisions with no cost feedback. Finance sees bills 30 days late. No one owns the gap between the two.
AWS migration cost estimates are consistently wrong — not because the tools are bad, but because they miss the parallel run period, data transfer during migration, and the operational tax of learning a new environment. Here is what to actually model.
AWS publishes every price on a public page, yet bills still arrive as surprises. The problem is not opacity — it is that real costs emerge from interactions between services, not from any single line item.
S3 storage pricing is genuinely low. S3 request pricing, replication costs, and the compounding effects of versioning and lifecycle misconfiguration are not. Most expensive S3 bills have nothing to do with how much data you store.