---
title: Ingress, Load Balancing, and Elastic Scale on AWS: L4 vs L7, Horizontal vs Vertical, and the Cold-Start Bill
description: As of May 8, 2026, Lambda bills INIT time on cold paths (pricing change live since Aug 1, 2025), API Gateway REST integrations time out at 29 seconds, and picking ALB vs NLB still determines whether TLS termination and routing live on the edge.
url: https://www.factualminds.com/blog/aws-ingress-scale-and-cold-start/
datePublished: 2026-05-08T00:00:00.000Z
dateModified: 2026-06-14T00:00:00.000Z
author: palaniappan-p
category: Cloud Architecture
tags: aws-application-load-balancer, aws-network-load-balancer, aws-lambda, aws-ecs, aws-eks, serverless, scaling, engineering-guide
---

# Ingress, Load Balancing, and Elastic Scale on AWS: L4 vs L7, Horizontal vs Vertical, and the Cold-Start Bill

> As of May 8, 2026, Lambda bills INIT time on cold paths (pricing change live since Aug 1, 2025), API Gateway REST integrations time out at 29 seconds, and picking ALB vs NLB still determines whether TLS termination and routing live on the edge.

On **May 8, 2026**, the operational default for public HTTP APIs on AWS is still **Application Load Balancer (ALB)** terminating TLS and routing by host and path—while high-volume TCP workloads stay on **Network Load Balancer (NLB)**. That split is not “old news”: it decides where HTTP/2 features, sticky sessions, and **AWS WAF** attach. For containers on EKS, our [n8n-on-EKS production guide](/blog/how-to-host-n8n-on-aws-eks-production-guide/) is a concrete L7 ingress story you can contrast with raw NLB patterns.

## Symptom → mechanism → AWS control

| Production symptom          | Mechanism                        | AWS control                                        |
| --------------------------- | -------------------------------- | -------------------------------------------------- |
| p99 latency spikes on scale | Lambda cold start                | Provisioned concurrency, snapstart (Java)          |
| ALB becomes bottleneck      | L7 parsing at extreme RPS        | NLB for TCP, CloudFront for static/cacheable       |
| Target registration delay   | New pods not in ALB target group | Readiness probe gates endpoint, ALB target-type IP |

**Opinionated take:** Match ingress to protocol—ALB for HTTP APIs, NLB for gRPC/TCP throughput, and budget provisioned concurrency for any Lambda on a latency SLO.

> **Benchmark pattern (hypothetical workload)** — ALB L7 ingress on EKS, 8K RPS, p99 14ms; NLB L4 TCP passthrough 45K RPS, p99 2ms; Lambda cold start 1.2s on 512MB adds 800ms to p99 until provisioned concurrency (50 units, $180/month) flattens tail.

INIT-phase billing for AWS Lambda has been **live since August 1, 2025**: cold paths charge initialization time the same as handler time. Combined with **29-second maximum integration timeout** on Amazon API Gateway REST APIs (a hard ceiling everyone hits eventually on slow downstreams), scale conversations are now as much about **billing physics** as about EC2 instance families.

> **Reproduce this** — Download [`check-alb-nlb-attributes.sh`](https://www.factualminds.com/examples/architecture-blog-2026/ingress-and-scale/check-alb-nlb-attributes.sh) from [`examples/architecture-blog-2026/ingress-and-scale/`](https://www.factualminds.com/examples/architecture-blog-2026/ingress-and-scale/check-alb-nlb-attributes.sh). Run it with two load balancer ARNs to compare `idle_timeout`, connection logs, and cross-zone settings side by side—read-only AWS CLI calls, AWS CLI v2.25+.

## L4 vs L7: what each load balancer optimizes

**NLB (Layer 4)** forwards TCP/UDP with minimal manipulation. Benefits: extreme performance, static IP/prefix options, long-lived connection friendliness. Costs: no native HTTP routing, host-header rules, or WAF on the listener the way ALB exposes.

**ALB (Layer 7)** understands HTTP. Benefits: path routing, Lambda and IP targets, AWS WAF integration, gRPC on ALB where supported. Costs: slightly higher latency than NLB for raw TCP passthrough workloads and a different pricing model for LCU consumption.

> **Opinionated take** — For customer-facing REST/JSON behind a domain name, **default to ALB** unless you have a measured L4 reason. NLB-in-front-of-ALB stacks are useful when a firewall partner demands fixed IPs or you must terminate non-HTTP protocols; they are not a free “performance hack” for standard JSON APIs.

## Horizontal vs vertical scaling on AWS

**Vertical scaling** (bigger `m7g.4xlarge`, more EBS throughput) reduces coordination overhead when the workload is single-threaded or license-bound.

**Horizontal scaling** (more tasks, more Lambda concurrency, more EC2 ASG capacity) wins for request-parallel workloads—_if_ data and session affinity do not serialize you.

Failure mode: horizontal scale amplifies **noisy neighbors** on shared databases. Pair compute scale with read replicas, caches, or [DynamoDB partition discipline](/blog/dynamodb-single-table-design-patterns-for-saas/) before you congratulate the ASG graph.

Read our [Karpenter vs Cluster Autoscaler cost guide](/blog/karpenter-vs-cluster-autoscaler-eks-cost-optimization/) when horizontal scale meets Kubernetes—different bin-packing economics than raw ASG.

## Cold starts: Lambda, INIT, and provisioned capacity

Two separate problems get lumped as “cold start”:

1. **INIT** — import graph, SDK clients, dependency injection.
2. **Execution ramp** — first requests after scale-to-zero before JVM CLR JIT warmth matters.

After **Aug 1, 2025**, INIT is billed. That moves “lazy imports” from a latency issue to a **cost-per-deploy** issue on bursty functions.

Mitigations worth sequencing (not all universal):

- **Smaller deployment packages** and lazy `require` / dynamic `import()` in Node 22 runtimes.
- **SnapStart** for supported Java runtimes where applicable.
- **Provisioned Concurrency** when latency SLOs fund the spare capacity (break-even math in our [Lambda cost optimization guide](/blog/aws-lambda-cost-optimization-pay-per-request-vs-provisioned/)).
- **RDS Proxy** when connection storms dominate INIT (see [RDS performance practices](/blog/aws-rds-database-performance-best-practices/) for database-side tuning context).

> **What broke** — A retail-traffic shaped workload moved Provisioned Concurrency from 50 → 200 without increasing **RDS Proxy** max connections. INIT time improved, but connection acquisition spikes during marketing events exhausted the database `max_connections`—p95 improved for Lambda while error rate climbed on checkout. Fix: cap Lambda reserved concurrency + pool limits + queue absorption (SQS) instead of unconstrained parallel DB opens.

Peaky AI spend adds a related trap: see [autoscaling AI workloads and budget overruns](/blog/aws-autoscaling-ai-workloads-budget-overrun/) before autoscaling pipelines widen blast radius.

## Hybrid compute reminder

Not every service belongs on Lambda. When sustained vCPU is cheaper on Graviton EC2 or Fargate, [hybrid compute guidance](/blog/hybrid-compute-ec2-serverless-cost-efficiency/) keeps finance and engineering aligned.

## More in This Track

Part of the **Engineering Guides** library (June 2026).

- Previous: [Part 5](/blog/how-to-deploy-eks-karpenter-cost-optimized-autoscaling/)
- Next: [Part 7](/blog/multi-region-aws-without-doubling-costs/)
- Browse tracks: [Engineering Guides hub](/resources/engineering-guides/)

## What This Post Doesn’t Cover

- **Gateway Load Balancer (GWLB)** inspection topologies for centralized firewalls—different buyer question than app ingress.
- **CloudFront** as the true edge vs regional ALB—see CDN comparisons separately.
- Per-protocol **HTTP/3** nuances on ALB—verify current Region/feature availability in AWS docs before promising.

## If You Only Do One Thing

Instrument **ALB target health** and **Lambda concurrent executions** on the same dashboard as **database connection counts**. Scale events without those three curves invite theatrical postmortems.

## What to Do This Week

1. Export ALB/NLB attributes for production ingress with the companion script; file tickets for any `idle_timeout` under your longest safe keep-alive path.
2. Confirm API Gateway (or ALB) timeouts ≤ downstream worst-case, with an explicit saga or async handoff before the **29s** REST ceiling.
3. Re-run Lambda memory/power tuning after the INIT billing change—stale 2024 baselines mis-price cold paths.

For correlated debugging once scale creates cross-service mysteries, continue with [debugging distributed AWS systems](/blog/debug-production-distributed-aws-systems/).

## FAQ

### When should we choose NLB over ALB?
Choose NLB when you need ultra-low latency TCP/UDP forwarding, preserve client IP at Layer 4, TLS pass-through, or extreme connection churn (millions of connections) without HTTP feature requirements. Choose ALB when you need HTTP host/path routing, weighted targets, AWS WAF integration on the listener, or HTTP/HTTPS-only modern ingress. Mixing them (NLB in front of ALB) is an anti-pattern for most HTTP APIs—reserve that pattern for legacy IP pinning or partner whitelisting constraints.

### When is vertical scaling the wrong first move on AWS?
When the bottleneck is coordination (database hot keys, single-threaded runtimes) or licensing per core. Bigger instances hide the problem for one deploy window but do not fix partition skew. Horizontal scale with proper sharding or queue-based absorption usually wins—after you prove the workload is embarrassingly parallel.

### What goes wrong if we only right-size Lambda memory without touching concurrency?
You can shorten duration per invocation (memory also buys CPU) but still hit regional concurrency caps, subnet IP exhaustion on ENI-heavy functions, or downstream throttling (RDS max_connections, partner APIs). Right-sizing without concurrency plus downstream capacity planning replaces one bottleneck with another.

### Does Provisioned Concurrency eliminate cold starts?
It eliminates init latency for pre-warmed execution environments on the provisioned slice, not for burst above that slice, runtimes without SnapStart benefits, or mis-sized VPC-attached functions. It is a bill line item—evaluate break-even with pay-per-request using load tests, not slides.

### What is a common ALB + ECS mistake during traffic spikes?
Health check thresholds too aggressive combined with slow-start on targets: ALB marks new tasks unhealthy before readiness completes, flapping replaces healthy capacity with connection resets. Tune health checks to match real startup time and align target group deregistration delay with graceful connection drain.

### Why does horizontal scale sometimes increase cost faster than vertical scale?
Because per-instance license fees, data transfer cross-AZ fan-out, and log volume scale with node count. FinOps visibility matters—see autoscaling budget patterns before you celebrate linear CPU charts.

---

*Source: https://www.factualminds.com/blog/aws-ingress-scale-and-cold-start/*
