AWS Lambda Cost Optimization: Pay-Per-Request vs Provisioned

Lambda’s pay-per-request pricing is one of its biggest selling points — but “pay only for what you use” does not automatically mean “pay the least possible.” Without optimization, Lambda costs can grow faster than expected, especially as workloads scale.

This guide covers the practical cost optimization strategies we implement for clients running serverless workloads on AWS.

Understanding Lambda Pricing

Lambda charges for two things:

Requests — $0.20 per million invocations
Duration — $0.0000166667 per GB-second (charged per millisecond)

Duration cost depends on two factors you control: memory allocation (which also determines CPU) and execution time.

Example: A function with 512 MB memory running for 200ms:

Duration cost: 0.5 GB × 0.2 seconds × $0.0000166667 = $0.00000167
Request cost: $0.0000002
Total per invocation: ~$0.0000019
At 10 million invocations/month: ~$19

The free tier provides 1 million requests and 400,000 GB-seconds per month — enough for many development and low-traffic production workloads.

Important: INIT Phase Billing Change (August 2025)

Effective August 1, 2025, AWS changed Lambda pricing to bill for the INIT phase (cold start initialization time). Previously, the INIT phase was free. Now, initialization time counts toward your billed duration at the same rate as execution time.

Impact: Functions with long initialization times (loading large dependencies, establishing database connections, loading ML models) now have higher effective costs per cold start. This change makes cold start optimization more financially important than before.

Mitigations: Lambda SnapStart, connection pooling via RDS Proxy, lazy loading, and reducing package size
Who is most affected: Java and .NET functions with large framework startup times, functions loading ML models, functions initializing large SDKs

For new functions, target INIT times under 200ms. For existing functions with INIT times over 1 second, evaluate Lambda SnapStart or architecture redesign.

Memory Tuning: The Biggest Lever

Lambda CPU scales linearly with memory. At 1,769 MB, you get one full vCPU. At 3,538 MB, you get two. This creates a counterintuitive optimization opportunity: more memory can be cheaper.

How It Works

A CPU-bound function at 128 MB might take 3,000ms to execute. At 512 MB (4x memory, 4x CPU), the same function might complete in 800ms. At 1,024 MB, it might take 400ms.

Memory	Duration	GB-seconds	Cost per invocation
128 MB	3,000ms	0.375	$0.00000625
256 MB	1,500ms	0.375	$0.00000625
512 MB	800ms	0.400	$0.00000667
1,024 MB	400ms	0.400	$0.00000667
1,769 MB	250ms	0.442	$0.00000737

In this example, 128 MB and 256 MB cost the same despite the memory difference — because the function completes proportionally faster with more CPU. The cost-optimal point depends on whether your function is CPU-bound, I/O-bound, or memory-bound.

AWS Lambda Power Tuning

Use the open-source AWS Lambda Power Tuning tool to find the optimal memory setting automatically. It runs your function at multiple memory configurations and reports:

Execution time at each memory level
Cost per invocation at each memory level
The cost-optimal and speed-optimal configurations

We run Power Tuning on every Lambda function in production. It typically reveals 20-40% cost savings on functions that were left at default memory settings.

Graviton (ARM) — 20% Cheaper

Lambda on ARM-based Graviton2 processors is 20% cheaper per GB-second than x86, with equivalent or better performance for most workloads.

Architecture	Price per GB-second
x86_64	$0.0000166667
arm64 (Graviton2)	$0.0000133334

Switching to ARM is usually a one-line change in your function configuration. Most Node.js, Python, and Go functions work without modification. Java and .NET functions may need testing for native dependency compatibility.

Our recommendation: Default to arm64 for all new functions. Migrate existing functions to arm64 unless they have specific x86 dependencies.

Lambda SnapStart: Eliminating Cold Start Cost

Lambda SnapStart pre-initializes your function and takes a snapshot of the initialized execution environment. On invocation, Lambda restores from the snapshot rather than running the INIT phase — eliminating cold start overhead entirely.

SnapStart was originally launched for Java. It has since been expanded to support Python and .NET runtimes. For current supported runtimes and versions, verify at the AWS Lambda SnapStart documentation.

Java 11, 17, 21

Cost impact with the August 2025 INIT billing change: If your function has a 2-second INIT phase and you have 100,000 cold starts per month with 512 MB memory, the INIT billing adds:

100,000 × 0.5 GB × 2 seconds × $0.0000166667 = $1.67/month

At higher scales (millions of invocations with cold starts), SnapStart pays for itself immediately. There is no additional charge for SnapStart beyond the snapshot storage ($0.0095/GB/month for the snapshot, typically a few cents).

SnapStart activation: A one-line change in your function configuration. The snapshot is taken at publish time, not at runtime.

Lambda Durable Functions

Announced at re:Invent 2025, Lambda Durable Functions brings stateful orchestration natively to Lambda without requiring Step Functions or external state stores. Durable Functions is designed for long-running workflows (minutes to days) with built-in state persistence.

🔄 Update (2026): Lambda Durable Functions is now Generally Available in US East (Ohio) with support for Python 3.13/3.14 and Node.js 22/24. Java support is in preview. The service enables workflows with up to 1-year execution duration — ideal for long-running AI agent pipelines, human-in-the-loop approvals, and multi-day batch processing. Expand to additional regions is ongoing; check the Lambda Durable Functions documentation for current regional availability.

Pricing: $8 per million orchestration operations (activity tasks, timer waits, external events)

Use cases:

Multi-step approval workflows
Long-running background jobs with checkpointing
Fan-out/fan-in patterns with state aggregation
Workflows that wait for human input or external events

For workflows that do not require the full expressiveness of Step Functions, Lambda Durable Functions offers simpler development and lower per-operation cost compared to Step Functions Standard workflows ($0.025 per state transition).

Pay-Per-Request vs. Provisioned Concurrency

This is the decision that trips up most teams: when does Provisioned Concurrency — which eliminates cold starts but adds always-on cost — actually save money?

On-Demand (Pay-Per-Request)

Pay per invocation and per millisecond of execution
Cold starts on first invocation and after idle periods
Scales automatically from zero to thousands of concurrent executions
Best for: variable traffic, background processing, non-latency-sensitive workloads

Provisioned Concurrency

Pre-warms a specified number of execution environments
Eliminates cold starts for those environments
Charges per provisioned environment per hour ($0.0000041667 per GB-second, plus request charges)
Best for: latency-sensitive APIs, predictable traffic patterns, compliance with response time SLAs

Break-Even Analysis

Provisioned Concurrency makes financial sense when:

You need consistently low latency — Sub-100ms p99 response times that cold starts would violate
You have predictable, steady traffic — The provisioned environments are utilized consistently
Cold start cost exceeds provisioning cost — If cold starts cause retries, timeouts, or user drop-off, the indirect cost justifies provisioning

Example calculation: 10 Provisioned Concurrency units at 512 MB, running 24/7:

Hourly cost: 10 × 0.5 GB × 3,600 seconds × $0.0000041667 = $0.075/hour
Monthly cost: $0.075 × 720 hours = $54/month

If those 10 units handle 5 million invocations per month (average 7 per second), the provisioning cost is $0.0000108 per invocation — less than the on-demand duration cost for most functions.

The rule of thumb: If a Provisioned Concurrency unit would handle at least 5 invocations per minute on average, provisioning is usually cheaper than the equivalent on-demand invocations plus the cold start overhead.

Lambda Response Streaming: A New Cost Dimension

Lambda response streaming (GA since 2023, increasingly adopted in 2025–2026) changes the billing model for functions that return large payloads. Streaming functions are billed on:

Execution duration — same as standard invocations
Data streamed — $0.06 per GB of data returned to the caller

For functions returning small responses (API JSON under a few KB), streaming adds negligible cost. For functions returning large files, reports, or AI-generated content, the streaming charge can exceed the duration charge.

When streaming saves money: The real cost advantage of streaming is user-perceived performance — callers receive the first bytes immediately without waiting for full generation. This reduces client timeouts and improves UX for AI inference and large document generation use cases.

When streaming increases cost: If your function streams large binary payloads (images, PDFs, video) that could be served directly from S3 with a presigned URL, the streaming charge applies unnecessarily. Prefer: generate the asset → upload to S3 → return a presigned URL to the caller.

Lambda@Edge vs CloudFront Functions: Choose the Cheaper Option

Both services run code at CloudFront edge locations, but their pricing and capability differ significantly:

	Lambda@Edge	CloudFront Functions
Price (requests)	$0.60 per million	$0.10 per million
Price (duration)	$0.00005001/GB-sec	Included in request price
Max execution time	5–30 seconds	1ms (sub-millisecond)
Runtime	Node.js, Python	JavaScript (ES5)
Network access	Yes	No
Use case	Complex logic, external calls	Header rewrites, URL redirects, simple A/B

Cost difference: CloudFront Functions are 6× cheaper per million executions for workloads that fit within the 1ms execution limit. If you are using Lambda@Edge purely for header manipulation, URL rewriting, or basic redirect logic — switch to CloudFront Functions for an immediate 83% reduction in edge compute costs.

Rule of thumb: Use CloudFront Functions for stateless, sub-millisecond logic with no external calls. Use Lambda@Edge only when you need network access, longer execution time, or Node.js/Python-specific libraries.

Architecture-Level Cost Optimization

Use Direct Service Integrations

API Gateway can integrate directly with DynamoDB, SQS, Step Functions, and other services without a Lambda function in between. This eliminates Lambda invocation costs for simple operations.

Before (Lambda proxy):

API Gateway → Lambda (parse request, call DynamoDB, format response) → DynamoDB

After (direct integration):

API Gateway → DynamoDB (VTL mapping template)

Savings: 100% of Lambda cost for that route.

Batch Processing with SQS

When processing messages from SQS, Lambda can receive up to 10 messages per invocation (or up to 10,000 with batching windows). Processing 10 messages in one invocation costs the same as processing 1.

Before: 1 million messages = 1 million invocations After (batch size 10): 1 million messages = 100,000 invocations

Savings: 90% reduction in invocation costs plus proportional duration savings from amortized initialization.

Avoid Synchronous Chains

Synchronous function-to-function calls (Lambda invoking Lambda) double your costs and create cascading cold start risks. Use asynchronous patterns instead:

Avoid: API Gateway → Lambda A → Lambda B → Lambda C (serial, synchronous) Prefer: API Gateway → Lambda A → SQS/EventBridge → Lambda B (async, decoupled)

Right-Size Connection Handling

Lambda functions that connect to RDS databases create connection overhead on every cold start. Use RDS Proxy to pool connections, reducing both database load and Lambda execution time.

Without RDS Proxy: 200ms per invocation for connection establishment With RDS Proxy: 5ms per invocation for connection from pool

At scale, this connection overhead difference reduces both latency and cost significantly.

Monitoring Lambda Costs

CloudWatch Metrics to Track

Invocations — Total function calls per period
Duration — Average, p50, p95, p99 execution times
ConcurrentExecutions — Peak concurrent executions (indicates scaling behavior)
Throttles — Invocations rejected due to concurrency limits
Errors — Failed invocations (retried invocations increase cost)

Cost Explorer Tags

Tag Lambda functions with:

Project — Which product or feature the function supports
Environment — Production, staging, development
Team — Which team owns the function

This enables per-project and per-team cost attribution in Cost Explorer.

Cost Anomaly Detection

Enable AWS Cost Anomaly Detection for Lambda to get alerts when spending deviates from historical patterns — catching runaway functions, infinite loops, or unexpected traffic spikes before they generate large bills.

Common Lambda Cost Mistakes

Mistake 1: Default Memory Settings

Lambda defaults to 128 MB, which is almost never optimal. Functions at 128 MB have minimal CPU and execute slowly, often costing more than the same function at 256 MB or 512 MB.

Mistake 2: Over-Provisioned Concurrency

Provisioning 100 concurrent environments “just in case” when your peak traffic only uses 20 wastes 80% of your provisioning spend. Use Application Auto Scaling to adjust Provisioned Concurrency based on actual demand.

Mistake 3: Logging Everything

console.log in every function with detailed request/response payloads generates massive CloudWatch Logs volumes. Verbose logging can cost more than the Lambda invocations themselves. Log strategically — errors always, debug only when needed.

2025 update — Tiered CloudWatch log pricing for Lambda (May 2025): AWS introduced tiered pricing for CloudWatch Logs ingestion from Lambda. The first 10 GB/month per function is charged at the standard $0.50/GB, with volume discounts beyond that threshold. More importantly, Lambda now supports S3 and Kinesis Firehose as direct log destinations — allowing you to route logs to S3 at $0.023/GB stored (vs. $0.50/GB ingested into CloudWatch), a 95% cost reduction for high-volume logging. Route structured logs directly to S3 via Firehose for analytics workloads, and send only error/warning logs to CloudWatch for real-time alerting.

Mistake 4: Not Using the Free Tier

The Lambda free tier (1M requests + 400,000 GB-seconds/month) applies every month, forever. For low-traffic functions, this means Lambda is genuinely free. Ensure your cost analysis accounts for the free tier.

Getting Started

Lambda cost optimization is not a one-time exercise. Workloads change, traffic patterns evolve, and AWS introduces new features and pricing options. We help organizations implement ongoing cost governance for serverless workloads as part of our broader AWS cost optimization services.

For end-to-end serverless architecture design and implementation, see our AWS Serverless Architecture Services.

AWS Lambda Cost Optimization: Pay-Per-Request vs Provisioned

Understanding Lambda Pricing

Important: INIT Phase Billing Change (August 2025)

Memory Tuning: The Biggest Lever

How It Works

AWS Lambda Power Tuning

Graviton (ARM) — 20% Cheaper

Lambda SnapStart: Eliminating Cold Start Cost

Lambda Durable Functions

Pay-Per-Request vs. Provisioned Concurrency

On-Demand (Pay-Per-Request)

Provisioned Concurrency

Break-Even Analysis

Lambda Response Streaming: A New Cost Dimension

Lambda@Edge vs CloudFront Functions: Choose the Cheaper Option

Architecture-Level Cost Optimization

Use Direct Service Integrations

Batch Processing with SQS

Avoid Synchronous Chains

Right-Size Connection Handling

Monitoring Lambda Costs

CloudWatch Metrics to Track

Cost Explorer Tags

Cost Anomaly Detection

Common Lambda Cost Mistakes

Mistake 1: Default Memory Settings

Mistake 2: Over-Provisioned Concurrency

Mistake 3: Logging Everything

Mistake 4: Not Using the Free Tier

Getting Started

Ready to discuss your AWS strategy?

Recommended Reading

Cloud Cost Optimization in 2026: 8 Modern Strategies Beyond the Basics

AWS Bedrock Cost Optimization: Token Budgets, Model Selection, and Inference Profiles

AWS Cost Optimization Hub: One Dashboard to Prioritize All Your Savings

Karpenter vs Cluster Autoscaler: EKS Node Cost Optimization in 2026

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

Understanding Lambda Pricing

Important: INIT Phase Billing Change (August 2025)

Memory Tuning: The Biggest Lever

How It Works

AWS Lambda Power Tuning

Graviton (ARM) — 20% Cheaper

Lambda SnapStart: Eliminating Cold Start Cost

Lambda Durable Functions

Pay-Per-Request vs. Provisioned Concurrency

On-Demand (Pay-Per-Request)

Provisioned Concurrency

Break-Even Analysis

Lambda Response Streaming: A New Cost Dimension

Lambda@Edge vs CloudFront Functions: Choose the Cheaper Option

Architecture-Level Cost Optimization

Use Direct Service Integrations

Batch Processing with SQS

Avoid Synchronous Chains

Right-Size Connection Handling

Monitoring Lambda Costs

CloudWatch Metrics to Track

Cost Explorer Tags

Cost Anomaly Detection

Common Lambda Cost Mistakes

Mistake 1: Default Memory Settings

Mistake 2: Over-Provisioned Concurrency

Mistake 3: Logging Everything

Mistake 4: Not Using the Free Tier

Getting Started

Ready to discuss your AWS strategy?

Recommended Reading

Cloud Cost Optimization in 2026: 8 Modern Strategies Beyond the Basics

AWS Bedrock Cost Optimization: Token Budgets, Model Selection, and Inference Profiles

AWS Cost Optimization Hub: One Dashboard to Prioritize All Your Savings

Karpenter vs Cluster Autoscaler: EKS Node Cost Optimization in 2026