AWS Lambda Cost Optimization: Pay-Per-Request vs Provisioned
Quick summary: A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations.
Key Takeaways
- A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations
- A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations

Table of Contents
Lambda’s pay-per-request pricing is one of its biggest selling points — but “pay only for what you use” does not automatically mean “pay the least possible.” Without optimization, Lambda costs can grow faster than expected, especially as workloads scale.
This guide covers the practical cost optimization strategies we implement for clients running serverless workloads on AWS.
Understanding Lambda Pricing
Lambda charges for two things:
- Requests — $0.20 per million invocations
- Duration — $0.0000166667 per GB-second (charged per millisecond)
Duration cost depends on two factors you control: memory allocation (which also determines CPU) and execution time.
Example: A function with 512 MB memory running for 200ms:
- Duration cost: 0.5 GB × 0.2 seconds × $0.0000166667 = $0.00000167
- Request cost: $0.0000002
- Total per invocation: ~$0.0000019
- At 10 million invocations/month: ~$19
The free tier provides 1 million requests and 400,000 GB-seconds per month — enough for many development and low-traffic production workloads.
Memory Tuning: The Biggest Lever
Lambda CPU scales linearly with memory. At 1,769 MB, you get one full vCPU. At 3,538 MB, you get two. This creates a counterintuitive optimization opportunity: more memory can be cheaper.
How It Works
A CPU-bound function at 128 MB might take 3,000ms to execute. At 512 MB (4x memory, 4x CPU), the same function might complete in 800ms. At 1,024 MB, it might take 400ms.
| Memory | Duration | GB-seconds | Cost per invocation |
|---|---|---|---|
| 128 MB | 3,000ms | 0.375 | $0.00000625 |
| 256 MB | 1,500ms | 0.375 | $0.00000625 |
| 512 MB | 800ms | 0.400 | $0.00000667 |
| 1,024 MB | 400ms | 0.400 | $0.00000667 |
| 1,769 MB | 250ms | 0.442 | $0.00000737 |
In this example, 128 MB and 256 MB cost the same despite the memory difference — because the function completes proportionally faster with more CPU. The cost-optimal point depends on whether your function is CPU-bound, I/O-bound, or memory-bound.
AWS Lambda Power Tuning
Use the open-source AWS Lambda Power Tuning tool to find the optimal memory setting automatically. It runs your function at multiple memory configurations and reports:
- Execution time at each memory level
- Cost per invocation at each memory level
- The cost-optimal and speed-optimal configurations
We run Power Tuning on every Lambda function in production. It typically reveals 20-40% cost savings on functions that were left at default memory settings.
Graviton (ARM) — 20% Cheaper
Lambda on ARM-based Graviton2 processors is 20% cheaper per GB-second than x86, with equivalent or better performance for most workloads.
| Architecture | Price per GB-second |
|---|---|
| x86_64 | $0.0000166667 |
| arm64 (Graviton2) | $0.0000133334 |
Switching to ARM is usually a one-line change in your function configuration. Most Node.js, Python, and Go functions work without modification. Java and .NET functions may need testing for native dependency compatibility.
Our recommendation: Default to arm64 for all new functions. Migrate existing functions to arm64 unless they have specific x86 dependencies.
Pay-Per-Request vs. Provisioned Concurrency
This is the decision that trips up most teams: when does Provisioned Concurrency — which eliminates cold starts but adds always-on cost — actually save money?
On-Demand (Pay-Per-Request)
- Pay per invocation and per millisecond of execution
- Cold starts on first invocation and after idle periods
- Scales automatically from zero to thousands of concurrent executions
- Best for: variable traffic, background processing, non-latency-sensitive workloads
Provisioned Concurrency
- Pre-warms a specified number of execution environments
- Eliminates cold starts for those environments
- Charges per provisioned environment per hour ($0.0000041667 per GB-second, plus request charges)
- Best for: latency-sensitive APIs, predictable traffic patterns, compliance with response time SLAs
Break-Even Analysis
Provisioned Concurrency makes financial sense when:
- You need consistently low latency — Sub-100ms p99 response times that cold starts would violate
- You have predictable, steady traffic — The provisioned environments are utilized consistently
- Cold start cost exceeds provisioning cost — If cold starts cause retries, timeouts, or user drop-off, the indirect cost justifies provisioning
Example calculation: 10 Provisioned Concurrency units at 512 MB, running 24/7:
- Hourly cost: 10 × 0.5 GB × 3,600 seconds × $0.0000041667 = $0.075/hour
- Monthly cost: $0.075 × 720 hours = $54/month
If those 10 units handle 5 million invocations per month (average 7 per second), the provisioning cost is $0.0000108 per invocation — less than the on-demand duration cost for most functions.
The rule of thumb: If a Provisioned Concurrency unit would handle at least 5 invocations per minute on average, provisioning is usually cheaper than the equivalent on-demand invocations plus the cold start overhead.
Architecture-Level Cost Optimization
Use Direct Service Integrations
API Gateway can integrate directly with DynamoDB, SQS, Step Functions, and other services without a Lambda function in between. This eliminates Lambda invocation costs for simple operations.
Before (Lambda proxy):
API Gateway → Lambda (parse request, call DynamoDB, format response) → DynamoDBAfter (direct integration):
API Gateway → DynamoDB (VTL mapping template)Savings: 100% of Lambda cost for that route.
Batch Processing with SQS
When processing messages from SQS, Lambda can receive up to 10 messages per invocation (or up to 10,000 with batching windows). Processing 10 messages in one invocation costs the same as processing 1.
Before: 1 million messages = 1 million invocations After (batch size 10): 1 million messages = 100,000 invocations
Savings: 90% reduction in invocation costs plus proportional duration savings from amortized initialization.
Avoid Synchronous Chains
Synchronous function-to-function calls (Lambda invoking Lambda) double your costs and create cascading cold start risks. Use asynchronous patterns instead:
Avoid: API Gateway → Lambda A → Lambda B → Lambda C (serial, synchronous) Prefer: API Gateway → Lambda A → SQS/EventBridge → Lambda B (async, decoupled)
Right-Size Connection Handling
Lambda functions that connect to RDS databases create connection overhead on every cold start. Use RDS Proxy to pool connections, reducing both database load and Lambda execution time.
Without RDS Proxy: 200ms per invocation for connection establishment With RDS Proxy: 5ms per invocation for connection from pool
At scale, this connection overhead difference reduces both latency and cost significantly.
Monitoring Lambda Costs
CloudWatch Metrics to Track
- Invocations — Total function calls per period
- Duration — Average, p50, p95, p99 execution times
- ConcurrentExecutions — Peak concurrent executions (indicates scaling behavior)
- Throttles — Invocations rejected due to concurrency limits
- Errors — Failed invocations (retried invocations increase cost)
Cost Explorer Tags
Tag Lambda functions with:
Project— Which product or feature the function supportsEnvironment— Production, staging, developmentTeam— Which team owns the function
This enables per-project and per-team cost attribution in Cost Explorer.
Cost Anomaly Detection
Enable AWS Cost Anomaly Detection for Lambda to get alerts when spending deviates from historical patterns — catching runaway functions, infinite loops, or unexpected traffic spikes before they generate large bills.
Common Lambda Cost Mistakes
Mistake 1: Default Memory Settings
Lambda defaults to 128 MB, which is almost never optimal. Functions at 128 MB have minimal CPU and execute slowly, often costing more than the same function at 256 MB or 512 MB.
Mistake 2: Over-Provisioned Concurrency
Provisioning 100 concurrent environments “just in case” when your peak traffic only uses 20 wastes 80% of your provisioning spend. Use Application Auto Scaling to adjust Provisioned Concurrency based on actual demand.
Mistake 3: Logging Everything
console.log in every function with detailed request/response payloads generates massive CloudWatch Logs volumes. At $0.50 per GB ingested, verbose logging can cost more than the Lambda invocations themselves. Log strategically — errors always, debug only when needed.
Mistake 4: Not Using the Free Tier
The Lambda free tier (1M requests + 400,000 GB-seconds/month) applies every month, forever. For low-traffic functions, this means Lambda is genuinely free. Ensure your cost analysis accounts for the free tier.
Getting Started
Lambda cost optimization is not a one-time exercise. Workloads change, traffic patterns evolve, and AWS introduces new features and pricing options. We help organizations implement ongoing cost governance for serverless workloads as part of our broader AWS cost optimization services.
For end-to-end serverless architecture design and implementation, see our AWS Serverless Architecture Services.


