AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations.

Key Facts

  • A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations
  • A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations

Entity Definitions

Lambda
Lambda is an AWS service discussed in this article.
AWS Lambda
AWS Lambda is an AWS service discussed in this article.
cost optimization
cost optimization is a cloud computing concept discussed in this article.

AWS Lambda Cost Optimization: Pay-Per-Request vs Provisioned

Cost Optimization & FinOps 6 min read

Quick summary: A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations.

Key Takeaways

  • A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations
  • A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations
AWS Lambda Cost Optimization: Pay-Per-Request vs Provisioned
Table of Contents

Lambda’s pay-per-request pricing is one of its biggest selling points — but “pay only for what you use” does not automatically mean “pay the least possible.” Without optimization, Lambda costs can grow faster than expected, especially as workloads scale.

This guide covers the practical cost optimization strategies we implement for clients running serverless workloads on AWS.

Understanding Lambda Pricing

Lambda charges for two things:

  1. Requests — $0.20 per million invocations
  2. Duration — $0.0000166667 per GB-second (charged per millisecond)

Duration cost depends on two factors you control: memory allocation (which also determines CPU) and execution time.

Example: A function with 512 MB memory running for 200ms:

  • Duration cost: 0.5 GB × 0.2 seconds × $0.0000166667 = $0.00000167
  • Request cost: $0.0000002
  • Total per invocation: ~$0.0000019
  • At 10 million invocations/month: ~$19

The free tier provides 1 million requests and 400,000 GB-seconds per month — enough for many development and low-traffic production workloads.

Memory Tuning: The Biggest Lever

Lambda CPU scales linearly with memory. At 1,769 MB, you get one full vCPU. At 3,538 MB, you get two. This creates a counterintuitive optimization opportunity: more memory can be cheaper.

How It Works

A CPU-bound function at 128 MB might take 3,000ms to execute. At 512 MB (4x memory, 4x CPU), the same function might complete in 800ms. At 1,024 MB, it might take 400ms.

MemoryDurationGB-secondsCost per invocation
128 MB3,000ms0.375$0.00000625
256 MB1,500ms0.375$0.00000625
512 MB800ms0.400$0.00000667
1,024 MB400ms0.400$0.00000667
1,769 MB250ms0.442$0.00000737

In this example, 128 MB and 256 MB cost the same despite the memory difference — because the function completes proportionally faster with more CPU. The cost-optimal point depends on whether your function is CPU-bound, I/O-bound, or memory-bound.

AWS Lambda Power Tuning

Use the open-source AWS Lambda Power Tuning tool to find the optimal memory setting automatically. It runs your function at multiple memory configurations and reports:

  • Execution time at each memory level
  • Cost per invocation at each memory level
  • The cost-optimal and speed-optimal configurations

We run Power Tuning on every Lambda function in production. It typically reveals 20-40% cost savings on functions that were left at default memory settings.

Graviton (ARM) — 20% Cheaper

Lambda on ARM-based Graviton2 processors is 20% cheaper per GB-second than x86, with equivalent or better performance for most workloads.

ArchitecturePrice per GB-second
x86_64$0.0000166667
arm64 (Graviton2)$0.0000133334

Switching to ARM is usually a one-line change in your function configuration. Most Node.js, Python, and Go functions work without modification. Java and .NET functions may need testing for native dependency compatibility.

Our recommendation: Default to arm64 for all new functions. Migrate existing functions to arm64 unless they have specific x86 dependencies.

Pay-Per-Request vs. Provisioned Concurrency

This is the decision that trips up most teams: when does Provisioned Concurrency — which eliminates cold starts but adds always-on cost — actually save money?

On-Demand (Pay-Per-Request)

  • Pay per invocation and per millisecond of execution
  • Cold starts on first invocation and after idle periods
  • Scales automatically from zero to thousands of concurrent executions
  • Best for: variable traffic, background processing, non-latency-sensitive workloads

Provisioned Concurrency

  • Pre-warms a specified number of execution environments
  • Eliminates cold starts for those environments
  • Charges per provisioned environment per hour ($0.0000041667 per GB-second, plus request charges)
  • Best for: latency-sensitive APIs, predictable traffic patterns, compliance with response time SLAs

Break-Even Analysis

Provisioned Concurrency makes financial sense when:

  1. You need consistently low latency — Sub-100ms p99 response times that cold starts would violate
  2. You have predictable, steady traffic — The provisioned environments are utilized consistently
  3. Cold start cost exceeds provisioning cost — If cold starts cause retries, timeouts, or user drop-off, the indirect cost justifies provisioning

Example calculation: 10 Provisioned Concurrency units at 512 MB, running 24/7:

  • Hourly cost: 10 × 0.5 GB × 3,600 seconds × $0.0000041667 = $0.075/hour
  • Monthly cost: $0.075 × 720 hours = $54/month

If those 10 units handle 5 million invocations per month (average 7 per second), the provisioning cost is $0.0000108 per invocation — less than the on-demand duration cost for most functions.

The rule of thumb: If a Provisioned Concurrency unit would handle at least 5 invocations per minute on average, provisioning is usually cheaper than the equivalent on-demand invocations plus the cold start overhead.

Architecture-Level Cost Optimization

Use Direct Service Integrations

API Gateway can integrate directly with DynamoDB, SQS, Step Functions, and other services without a Lambda function in between. This eliminates Lambda invocation costs for simple operations.

Before (Lambda proxy):

API Gateway → Lambda (parse request, call DynamoDB, format response) → DynamoDB

After (direct integration):

API Gateway → DynamoDB (VTL mapping template)

Savings: 100% of Lambda cost for that route.

Batch Processing with SQS

When processing messages from SQS, Lambda can receive up to 10 messages per invocation (or up to 10,000 with batching windows). Processing 10 messages in one invocation costs the same as processing 1.

Before: 1 million messages = 1 million invocations After (batch size 10): 1 million messages = 100,000 invocations

Savings: 90% reduction in invocation costs plus proportional duration savings from amortized initialization.

Avoid Synchronous Chains

Synchronous function-to-function calls (Lambda invoking Lambda) double your costs and create cascading cold start risks. Use asynchronous patterns instead:

Avoid: API Gateway → Lambda A → Lambda B → Lambda C (serial, synchronous) Prefer: API Gateway → Lambda A → SQS/EventBridge → Lambda B (async, decoupled)

Right-Size Connection Handling

Lambda functions that connect to RDS databases create connection overhead on every cold start. Use RDS Proxy to pool connections, reducing both database load and Lambda execution time.

Without RDS Proxy: 200ms per invocation for connection establishment With RDS Proxy: 5ms per invocation for connection from pool

At scale, this connection overhead difference reduces both latency and cost significantly.

Monitoring Lambda Costs

CloudWatch Metrics to Track

  • Invocations — Total function calls per period
  • Duration — Average, p50, p95, p99 execution times
  • ConcurrentExecutions — Peak concurrent executions (indicates scaling behavior)
  • Throttles — Invocations rejected due to concurrency limits
  • Errors — Failed invocations (retried invocations increase cost)

Cost Explorer Tags

Tag Lambda functions with:

  • Project — Which product or feature the function supports
  • Environment — Production, staging, development
  • Team — Which team owns the function

This enables per-project and per-team cost attribution in Cost Explorer.

Cost Anomaly Detection

Enable AWS Cost Anomaly Detection for Lambda to get alerts when spending deviates from historical patterns — catching runaway functions, infinite loops, or unexpected traffic spikes before they generate large bills.

Common Lambda Cost Mistakes

Mistake 1: Default Memory Settings

Lambda defaults to 128 MB, which is almost never optimal. Functions at 128 MB have minimal CPU and execute slowly, often costing more than the same function at 256 MB or 512 MB.

Mistake 2: Over-Provisioned Concurrency

Provisioning 100 concurrent environments “just in case” when your peak traffic only uses 20 wastes 80% of your provisioning spend. Use Application Auto Scaling to adjust Provisioned Concurrency based on actual demand.

Mistake 3: Logging Everything

console.log in every function with detailed request/response payloads generates massive CloudWatch Logs volumes. At $0.50 per GB ingested, verbose logging can cost more than the Lambda invocations themselves. Log strategically — errors always, debug only when needed.

Mistake 4: Not Using the Free Tier

The Lambda free tier (1M requests + 400,000 GB-seconds/month) applies every month, forever. For low-traffic functions, this means Lambda is genuinely free. Ensure your cost analysis accounts for the free tier.

Getting Started

Lambda cost optimization is not a one-time exercise. Workloads change, traffic patterns evolve, and AWS introduces new features and pricing options. We help organizations implement ongoing cost governance for serverless workloads as part of our broader AWS cost optimization services.

For end-to-end serverless architecture design and implementation, see our AWS Serverless Architecture Services.

Contact us to optimize your serverless costs →

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »