---
title: How to Build Hybrid Compute (EC2 + Serverless) for Cost Efficiency
description: A technical guide to hybrid compute architectures that combine EC2, Lambda, Fargate, and Step Functions — with worked cost calculations, SQS buffering patterns, and decision frameworks based on invocation pattern rather than unit cost.
url: https://www.factualminds.com/blog/hybrid-compute-ec2-serverless-cost-efficiency/
datePublished: 2026-03-29T00:00:00.000Z
dateModified: 2026-04-16T00:00:00.000Z
author: palaniappan-p
category: Cloud Architecture
tags: how-to-guide, aws, ec2, lambda, serverless, hybrid-compute, hybrid-cloud-integration, event-driven, cost-optimization, sqs, step-functions
---

# How to Build Hybrid Compute (EC2 + Serverless) for Cost Efficiency

> A technical guide to hybrid compute architectures that combine EC2, Lambda, Fargate, and Step Functions — with worked cost calculations, SQS buffering patterns, and decision frameworks based on invocation pattern rather than unit cost.

The default framing of "EC2 vs. serverless" treats compute as an either/or decision. Production systems rarely fit neatly into one model. A SaaS API with predictable daytime traffic and unpredictable batch processing overnight, or an application with a steady web tier and spiky webhook processing, benefits from combining multiple compute models within a single architecture.

**Hybrid compute** is the deliberate use of multiple compute primitives — EC2, Lambda, Fargate, Step Functions, Batch — with each handling the workload shape it is economically and technically suited for. The goal is not minimizing lines on the bill but minimizing total cost while meeting latency, throughput, and reliability requirements.

This guide provides the decision framework, architecture patterns, and worked cost calculations to make hybrid compute decisions grounded in actual numbers rather than intuition.

## The Hybrid Compute Thesis

### Invocation Pattern, Not Unit Cost

The most common mistake in compute selection is comparing unit costs without accounting for utilization patterns. Lambda is not cheap per unit — it is cheap when the alternative is paying for EC2 capacity that would sit idle between requests.

**The utilization math:**

An EC2 `t3.small` at $0.0208/hr runs whether it is handling 1,000 requests per second or zero. A Lambda function at 1GB memory costs $0.000001667 per 100ms invocation. At 1,000 invocations per hour (very low traffic), Lambda costs $0.0017/hr. The `t3.small` costs $0.0208/hr regardless. Lambda is 12x cheaper.

At 50,000 invocations per hour with 100ms average duration, Lambda costs $0.083/hr. The `t3.small` cannot handle 50,000 invocations per hour at 100ms (that would require near-100% CPU), so comparison requires a larger instance or multiple instances.

The framework:

| Invocation pattern                 | Recommended compute                        |
| ---------------------------------- | ------------------------------------------ |
| <10% utilization, irregular spikes | Lambda (on-demand)                         |
| 10–40% utilization, bursty         | EC2 T-series + Lambda burst overflow       |
| >40% utilization, predictable      | EC2 compute-optimized + Reserved Instances |
| Variable, long-running processing  | Fargate Spot                               |
| Batch processing, fault-tolerant   | AWS Batch on Spot EC2                      |
| Orchestrated multi-step workflows  | Step Functions                             |

### What Lambda Cannot Do

Understanding Lambda's constraints is as important as understanding its cost model:

**No persistent connections:** Lambda execution environments are ephemeral. A database connection opened in one invocation cannot be reused in the next. Use RDS Proxy (for PostgreSQL/MySQL) or ElastiCache to maintain a connection pool outside Lambda. Without connection pooling, Lambda functions that query RDS directly will exhaust the database's `max_connections` under moderate concurrency.

**512MB local ephemeral storage** (expandable to 10GB `/tmp`, with cost): Not suitable for processing large files inline. Stream from S3 for large file processing.

**15-minute maximum execution time:** Any workload requiring more than 15 minutes of processing must be split into smaller units or moved to ECS/EC2.

**No warm state across cold starts:** In-memory caches, connection pools, and warm model weights are lost between cold starts. Design Lambda functions to rebuild state on cold start (with acceptable latency cost) or use provisioned concurrency for latency-sensitive paths.

## Offloading Traffic Spikes from EC2 to Lambda

### SQS-Buffered Lambda Fan-Out

The pattern: incoming requests are written to SQS from your EC2 API tier. Lambda consumes the queue and performs the actual processing. This decouples your API response time from processing latency and creates a durable buffer that absorbs traffic spikes.

```
Client → API (EC2/ECS) → SQS Queue → Lambda (fan-out) → Processing
                                    ↓
                               Dead Letter Queue
```

**When this pattern is correct:**

- Processing is asynchronous (user does not wait for completion)
- Processing failures should not affect the API response
- Peak processing rate is significantly higher than average (5x or more)
- Processing cost is per-unit not per-time (Lambda economics fit)

**Terraform for Lambda + SQS trigger + dead-letter queue:**

```hcl
resource "aws_sqs_queue" "processing_dlq" {
  name                      = "processing-dlq"
  message_retention_seconds = 1209600  # 14 days

  tags = {
    Environment = var.environment
    Purpose     = "dead-letter-queue"
  }
}

resource "aws_sqs_queue" "processing_queue" {
  name                       = "processing-queue"
  visibility_timeout_seconds = 300  # Must be >= Lambda timeout
  message_retention_seconds  = 86400
  receive_wait_time_seconds  = 20  # Long polling — reduces empty receives

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.processing_dlq.arn
    maxReceiveCount     = 3
  })

  tags = {
    Environment = var.environment
  }
}

resource "aws_lambda_function" "processor" {
  filename         = data.archive_file.processor.output_path
  function_name    = "event-processor"
  role             = aws_iam_role.lambda_processor.arn
  handler          = "index.handler"
  runtime          = "nodejs22.x"
  timeout          = 300
  memory_size      = 1024

  reserved_concurrent_executions = 100  # Protect downstream services

  environment {
    variables = {
      ENVIRONMENT    = var.environment
      DB_PROXY_HOST  = aws_db_proxy.main.endpoint
    }
  }

  dead_letter_config {
    target_arn = aws_sqs_queue.processing_dlq.arn
  }

  tracing_config {
    mode = "Active"  # X-Ray tracing
  }

  tags = {
    Environment = var.environment
  }
}

resource "aws_lambda_event_source_mapping" "sqs_trigger" {
  event_source_arn                   = aws_sqs_queue.processing_queue.arn
  function_name                      = aws_lambda_function.processor.arn
  batch_size                         = 10
  maximum_batching_window_in_seconds = 5  # Wait up to 5s to batch messages

  scaling_config {
    maximum_concurrency = 100  # Match reserved_concurrent_executions
  }

  function_response_types = ["ReportBatchItemFailures"]  # Partial batch failure handling
}

resource "aws_iam_role_policy_attachment" "lambda_sqs" {
  role       = aws_iam_role.lambda_processor.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaSQSQueueExecutionRole"
}
```

**`ReportBatchItemFailures`** is critical for production SQS triggers. Without it, if one message in a batch of 10 fails processing, Lambda returns the entire batch to the queue and all 10 messages are retried. With `ReportBatchItemFailures`, you return only the specific message IDs that failed, and the rest are deleted from the queue successfully.

### API Gateway Lambda for Burst Absorption

For synchronous APIs that must respond to users in real time, the pattern shifts: **API Gateway with Lambda** handles burst traffic while your EC2 tier handles baseline traffic. API Gateway can route requests to either EC2 (via ALB integration) or Lambda based on a routing rule.

The simpler approach: use **ALB weighted target groups** to distribute traffic between EC2 and Lambda. This requires identical request/response semantics between the two backends.

```
Client → ALB
          ├── Target Group A (EC2, weight: 100) — baseline traffic
          └── Target Group B (Lambda, weight: 0) — burst only
```

During normal operation, all traffic goes to EC2. When EC2 Auto Scaling is warming up new instances during a burst, temporarily increase Lambda's weight to absorb overflow.

This can be automated with a CloudWatch alarm on ALB `TargetResponseTime` or `RequestCount` that triggers a Lambda function to adjust the ALB target group weights via the AWS SDK.

## Event-Driven Processing: When Lambda Wins vs When EC2 Wins

### Lambda Excels

**S3 event processing:** An object is uploaded to S3 and triggers a Lambda function to process it (resize image, parse CSV, index content, generate thumbnail). The invocation pattern is event-driven with natural fan-out. Processing one file triggers one Lambda invocation. 10,000 files trigger 10,000 concurrent Lambda invocations. EC2 would require a queue, polling loop, and Auto Scaling policy to achieve equivalent throughput.

**DynamoDB Streams:** Lambda processes DynamoDB change events for real-time propagation (replicating changes to Elasticsearch, sending webhooks, updating derived data). The stream delivers changes in order per partition key. Lambda's Event Source Mapping handles checkpointing automatically.

**SNS fan-out:** An SNS topic fans messages to multiple Lambda functions simultaneously. Each function handles a different processing concern (send email, update CRM, log to analytics). Lambda's isolation means one handler's failure does not affect others.

**Scheduled short tasks:** EventBridge cron rules trigger Lambda functions for periodic tasks (generate reports, send daily digest emails, purge expired records). Lambda is cheaper than an EC2 instance running a cron scheduler, particularly for tasks that run for seconds rather than minutes.

### EC2 Wins

**Stateful processing:** A job that processes 50,000 records by loading a 2GB lookup table into memory and making 50,000 lookups against it. Lambda's 15-minute timeout and 10GB memory limit technically support this, but the cold start cost of loading the 2GB lookup table every invocation makes EC2 or ECS far more efficient.

**Long-running streaming:** Processing a continuous data stream from Kinesis where records arrive at 50,000/second, state must be maintained across records (windowed aggregations, deduplication), and processing must continue without interruption. Kinesis Data Streams with a long-running ECS consumer processes this efficiently. Lambda functions processing Kinesis shards work for simpler use cases but introduce complexity around parallelism and ordering.

**High-concurrency database workloads:** 500 concurrent Lambda invocations each opening a PostgreSQL connection saturates a standard RDS instance's connection pool. With RDS Proxy this is manageable, but the connection overhead and Proxy latency affect p99 latency. An EC2 application server with a PgBouncer connection pool handles 500 concurrent connections efficiently against a 100-connection RDS instance.

## Batch Jobs vs Real-Time Workers

### AWS Batch on Spot EC2

**AWS Batch** manages job queues, compute environments, and job scheduling on EC2 (including Spot). For batch workloads that are fault-tolerant, run for minutes to hours, and require significant compute:

```
Batch Job Queue → Compute Environment (Spot EC2, c7g family)
     ↓
Job Definition: Docker container + resource requirements + retry policy
```

Batch handles:

- Spot interruption recovery (requeues jobs on spot reclamation)
- Automatic scaling of EC2 instances to queue depth
- Job dependency graphs (run job B after job A completes)
- Resource packing (fit as many jobs as possible on each instance)

Batch is the correct choice over Lambda for:

- Jobs running longer than 15 minutes
- Jobs requiring more than 10GB memory or 6 vCPUs
- Jobs with GPU requirements
- Workloads where Spot savings (60–80% vs On-Demand) are significant

### Lambda for Real-Time Triggers

Lambda handles the trigger layer efficiently. An S3 upload triggers Lambda to submit a Batch job rather than processing inline. Lambda handles the <100ms response path; Batch handles the multi-minute processing path. This separation keeps your API responsive while processing happens asynchronously.

```python
# Lambda function: triggered by S3, submits Batch job
import boto3
import json

batch = boto3.client('batch')

def handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    response = batch.submit_job(
        jobName=f"process-{key.replace('/', '-')}",
        jobQueue='processing-queue',
        jobDefinition='data-processor:latest',
        containerOverrides={
            'environment': [
                {'name': 'S3_BUCKET', 'value': bucket},
                {'name': 'S3_KEY', 'value': key},
            ]
        },
        retryStrategy={
            'attempts': 2,
            'evaluateOnExit': [
                {
                    'onStatusReason': 'Host EC2*terminated',
                    'action': 'RETRY'
                }
            ]
        }
    )

    return {
        'statusCode': 200,
        'body': json.dumps({'jobId': response['jobId']})
    }
```

## Cold Start Mitigation

### Lambda SnapStart (Java, Python, .NET)

**SnapStart** initializes the Lambda execution environment, takes a snapshot of the memory and disk state after initialization, and restores from that snapshot for subsequent invocations. For a Java Spring Boot Lambda function that takes 8 seconds to initialize, SnapStart reduces effective cold start to under 1 second.

```yaml
# CloudFormation / SAM template — enable SnapStart for Java
Resources:
  ProcessorFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: com.example.Handler::handleRequest
      Runtime: java21
      SnapStart:
        ApplyOn: PublishedVersions
      AutoPublishAlias: live
      # SnapStart requires a published version alias
```

SnapStart limitations:

- Only applies to published version aliases, not $LATEST
- Uniqueness constraints: network calls during initialization are snapshotted, which means credentials, connection handles, and timestamps must be refreshed after snapshot restore
- Use `RegisterCheckpoint` hook to refresh these values after restore

### Provisioned Concurrency Economics

**Provisioned concurrency** keeps Lambda execution environments warm, eliminating cold starts entirely. It is charged at $0.000004646 per GB-second regardless of invocations.

A 1GB function with 10 provisioned concurrency units costs:

```
10 units × 1GB × 3600 seconds × $0.000004646 = $0.167/hour = $120/month
```

Compare: an EC2 `t3.small` (which could replace those 10 Lambda units for low-RPS APIs) costs $0.0208/hr = $15/month.

**When provisioned concurrency is justified:**

- p99 cold start latency is unacceptable (payment flows, authentication endpoints)
- Traffic pattern is predictable — you can use Application Auto Scaling to scale provisioned concurrency down during known off-peak hours
- The alternative (EC2) has higher operational overhead that justifies the cost premium

**Provisioned concurrency with scheduled scaling (cost-optimized):**

```hcl
resource "aws_appautoscaling_target" "lambda_pc" {
  service_namespace  = "lambda"
  resource_id        = "function:${aws_lambda_function.api.function_name}:${aws_lambda_alias.api.name}"
  scalable_dimension = "lambda:function:ProvisionedConcurrency"
  min_capacity       = 0
  max_capacity       = 20
}

resource "aws_appautoscaling_scheduled_action" "scale_up" {
  name               = "scale-up-business-hours"
  service_namespace  = aws_appautoscaling_target.lambda_pc.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_pc.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_pc.scalable_dimension
  schedule           = "cron(0 7 * * MON-FRI *)"  # 7 AM UTC weekdays

  scalable_target_action {
    min_capacity = 10
    max_capacity = 20
  }
}

resource "aws_appautoscaling_scheduled_action" "scale_down" {
  name               = "scale-down-off-peak"
  service_namespace  = aws_appautoscaling_target.lambda_pc.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_pc.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_pc.scalable_dimension
  schedule           = "cron(0 20 * * MON-FRI *)"  # 8 PM UTC weekdays

  scalable_target_action {
    min_capacity = 0
    max_capacity = 0
  }
}
```

This reduces provisioned concurrency costs by 54% (13 hours on, 11 hours off weekdays, full off weekends) compared to 24/7 provisioned concurrency.

### Container Image Reuse

Lambda functions deployed as container images (up to 10GB) reuse execution environments across invocations when possible. Container image warm starts are faster than cold starts for large images because the image layers are cached at the Lambda service layer. ECR pull does not happen on every invocation — only on first invocation per execution environment.

For Python or Node.js Lambda functions that load large ML models or data files at initialization, storing those files in the container image (rather than downloading from S3 on cold start) can reduce cold start time by 3–8 seconds.

## Cost Calculation: Hybrid vs Pure EC2 vs Pure Serverless

### The Scenario

A SaaS API with the following traffic profile:

- Weekday peak: 1,000 requests/second (8 hours/day)
- Weekday off-peak: 100 requests/second (6 hours/day)
- Overnight: 20 requests/second (10 hours/day)
- Weekend: 200 requests/second average (24 hours/day)
- Average request processing: 50ms at 1 vCPU equivalent
- Occasional burst: 3,000 requests/second for 15-minute periods, 3x per week

### Architecture Options

**Option A: Pure EC2 (sized for peak)**

Size for 1,000 req/sec sustained + 3,000 req/sec burst headroom:

```
4x c7g.xlarge (4 vCPU, 8GB) = 16 vCPU total
Target: 750 req/sec/instance at 75% CPU utilization
```

Cost:

- 4 × `c7g.xlarge` × $0.1448/hr × 720 hours = $417/month
- ELB: $0.008/hr + $0.008 per LCU = ~$25/month
- **Total: ~$442/month**

Burst handling: Auto Scaling adds instances (3–5 minute latency). Burst events of 3,000 req/sec for 15 minutes will cause elevated latency until new instances are healthy.

**Option B: Pure Lambda (1GB, 50ms average)**

```
Invocations:
  Weekday peak: 1,000/s × 8hr × 5 days × 4 weeks = 576M invocations
  Weekday off-peak: 100/s × 6hr × 5 days × 4 weeks = 43.2M invocations
  Overnight: 20/s × 10hr × 7 days × 4 weeks = 20.2M invocations
  Weekend: 200/s × 24hr × 8 days = 138.2M invocations
  Total: ~777M invocations/month
```

Lambda cost:

- Free tier: first 1M invocations free
- Invocations: 776M × $0.0000002 = $155.2
- Duration: 776M × 0.05s × 1GB × $0.0000166667/GB-sec = $646
- API Gateway HTTP API: 777M × $0.000001 = $0.78
- Provisioned concurrency (for peak, 10 units 8hr weekdays): $120/month
- **Total: ~$922/month**

Pure Lambda is significantly more expensive for this sustained-traffic workload.

**Option C: Hybrid — EC2 baseline + Lambda burst**

```
Baseline: 2x c7g.xlarge On-Demand for weekday peak coverage
Burst: Lambda handles overflow during 3x weekly burst events
Off-peak: Scale in to 1x c7g.xlarge overnight via Auto Scaling
```

Cost:

- 2× `c7g.xlarge` On-Demand: $0.2896/hr × 720hr = $208.5/month
- Lambda burst: 3 bursts × 15min × 2,000 excess req/sec × 50ms × 1GB = 2.7M invocations, $27
- ELB: $25/month
- Auto Scaling scale-in savings: ~$30/month
- **Total: ~$230/month**

The hybrid approach achieves a 48% cost reduction vs pure EC2 (burst handled without over-provisioning) and a 75% cost reduction vs pure Lambda.

### Cost Comparison Table

| Compute model                  | Monthly cost | Burst handling            | Cold start risk   | Operational overhead    |
| ------------------------------ | ------------ | ------------------------- | ----------------- | ----------------------- |
| EC2 sized for peak             | $442         | No degradation            | None              | AMI updates, patching   |
| Pure Lambda (1GB)              | $922         | Automatic                 | Yes (150–500ms)   | Minimal                 |
| Hybrid (EC2 + Lambda burst)    | $230         | Automatic via Lambda      | Lambda burst only | AMI updates + Lambda    |
| Hybrid + Spot workers (batch)  | $160         | Spot interruption managed | None for web tier | Higher complexity       |
| EC2 + Reserved Instances (1yr) | $290         | Auto Scaling (3-5min lag) | None              | AMI updates, commitment |

## Edge Cases

### Unpredictable Invocation Spikes

SQS buffering protects downstream EC2/Lambda from sudden spikes, but the queue can grow unboundedly if the spike exceeds processing capacity for an extended period. Set a **maximum message retention** on your SQS queue appropriate to your SLA:

- For real-time event processing: 5 minutes retention maximum (stale events are worthless)
- For email/notification dispatch: 24 hours (user can wait, but not forever)
- For financial transactions: 14 days (maximum SQS retention) with monitoring

Set **CloudWatch alarms on `ApproximateAgeOfOldestMessage`** — this metric tells you how far behind your processing has fallen. If the oldest message is 10 minutes old and your SLA is 30-second processing, you need to scale processing capacity.

### Lambda Timeout Cascades

A Lambda function timing out does not fail gracefully if it holds a resource lock (database row lock, distributed mutex) at timeout. The lock is abandoned but may not be released cleanly, depending on the resource type.

Design patterns to prevent timeout cascades:

- Set Lambda timeout to 80% of SQS `VisibilityTimeout` — ensures Lambda finishes or fails before SQS makes the message visible to other consumers
- Use database transactions with explicit timeouts shorter than Lambda timeout (`SET statement_timeout = '25s'` for PostgreSQL)
- Implement idempotency — if a Lambda function is retried after a timeout, reprocessing the same message should produce the same result

### State Management Across the Hybrid Boundary

When a request starts on EC2 and hands off to Lambda via SQS, any state in the EC2 process memory (session data, request context, correlation IDs) must be explicitly serialized into the SQS message. Lambda cannot access EC2 instance memory.

Use **X-Ray trace IDs** to correlate the EC2 request handling with Lambda processing:

```javascript
// EC2: publish SQS message with trace context
const traceHeader = process.env._X_AMZN_TRACE_ID;

await sqs
  .sendMessage({
    QueueUrl: process.env.QUEUE_URL,
    MessageBody: JSON.stringify({
      payload: requestPayload,
      traceId: traceHeader, // Propagate X-Ray trace
      correlationId: requestId, // Application-level correlation
      userId: user.id,
      tenantId: tenant.id,
    }),
  })
  .promise();
```

```python
# Lambda: restore trace context
import os

def handler(event, context):
    for record in event['Records']:
        body = json.loads(record['body'])
        correlation_id = body['correlationId']

        # Set correlation ID in logger context
        logger = logging.getLogger()
        logger = logging.LoggerAdapter(logger, {
            'correlation_id': correlation_id,
            'tenant_id': body['tenantId'],
        })

        process_message(body, logger)
```

For Lambda cost optimization context, see our [Lambda Cost Optimization: Pay-per-Request vs Provisioned Concurrency](/blog/aws-lambda-cost-optimization-pay-per-request-vs-provisioned/) guide.

For broader AWS cost architecture, see the [AWS Cost Control Architecture and Optimization Playbook](/blog/aws-cost-control-architecture-optimization-playbook/).

For ECS Fargate as a middle ground between EC2 and Lambda, see our [ECS vs EKS Container Orchestration Decision Guide](/blog/aws-ecs-vs-eks-container-orchestration-decision-guide/).

## FAQ

### How do you decide what workloads belong on Lambda vs EC2?
The decision framework is based on invocation pattern, not just unit cost. Lambda is cost-optimal when your workload has irregular invocation patterns with significant idle time between requests — Lambda charges only for execution duration, whereas EC2 runs at full cost even when idle. The practical threshold: if your workload runs fewer than 5 million invocations per month and has at least 30–40% idle time, Lambda is almost always cheaper. If your workload runs continuously with predictable throughput, EC2 (especially with Reserved Instances) becomes cheaper above roughly 20–30% utilization. The other Lambda decision factor is statelessness: Lambda cannot hold persistent connections across invocations, cannot use local disk for durable data, and resets in-memory state between cold starts. Workloads requiring database connection pools, warm caches, or multi-step state machines belong on EC2 or Fargate.

### What is Lambda SnapStart and when does it eliminate cold start costs?
Lambda SnapStart is a feature for Java Lambda functions (and as of 2025, Python and .NET) that takes a snapshot of the initialized execution environment and restores from that snapshot instead of re-initializing. A Java Spring Boot Lambda function that takes 8–12 seconds to initialize reduces to under 1 second with SnapStart. SnapStart is most valuable when cold starts are frequent relative to total invocations — low-traffic APIs with consistent but infrequent requests, event-driven processors that scale from zero regularly, and any function where the initialization time is dominated by framework startup rather than I/O. SnapStart does not help functions with network-bound initialization (connecting to databases, loading secrets) because those operations must re-run after snapshot restore. The cost of SnapStart is zero — there is no additional charge for the snapshot mechanism.

### How does SQS buffering help manage EC2-to-Lambda traffic spikes?
SQS buffering decouples the inbound request rate from your processing capacity. Without buffering, a traffic spike that exceeds EC2 capacity either drops requests or causes autoscaling latency (3–5 minutes for new instances to become healthy). With SQS, incoming requests are written to the queue immediately (SQS write latency is under 5ms) and processed at whatever rate your downstream compute can handle. Lambda reads from SQS using the Event Source Mapping feature — Lambda automatically scales to process the queue, up to the configured concurrency limit. For EC2-backed processing, you add a consumer process that polls SQS and scale the Auto Scaling group based on the SQS ApproximateNumberOfMessagesVisible metric. The buffer absorbs spikes that would have caused dropped requests and processes them as capacity allows, trading immediate response for guaranteed delivery.

### What is the break-even point between Lambda and EC2 pricing?
The break-even calculation depends on function memory size and execution duration, but as a useful rule of thumb: a Lambda function at 1GB memory executing for 100ms costs approximately $0.000001667 per invocation. A t3.small EC2 instance ($0.0208/hr on-demand) runs 3.3 million Lambda-equivalent units per hour. If your workload generates fewer than 3.3 million invocations per hour with each taking 100ms, Lambda is cheaper than an always-on t3.small. At higher memory (3GB, 200ms), the break-even drops to around 300,000 invocations per hour — below that, Lambda wins. For Reserved Instance pricing (1-year, no upfront), divide the break-even by roughly 1.4. The calculation also ignores Lambda provisioned concurrency charges for warm starts, which add $0.000004646 per GB-second of provisioned capacity regardless of invocations.

---

*Source: https://www.factualminds.com/blog/hybrid-compute-ec2-serverless-cost-efficiency/*