Ingress, Load Balancing, and Elastic Scale on AWS: L4 vs L7, Horizontal vs Vertical, and the Cold-Start Bill
Quick summary: As of May 8, 2026, Lambda bills INIT time on cold paths (pricing change live since Aug 1, 2025), API Gateway REST integrations time out at 29 seconds, and picking ALB vs NLB still determines whether TLS termination and routing live on the edge.
Key Takeaways
- That split is not “old news”: it decides where HTTP/2 features, sticky sessions, and AWS WAF attach
- For containers on EKS, our n8n-on-EKS production guide is a concrete L7 ingress story you can contrast with raw NLB patterns
- INIT-phase billing for AWS Lambda has been live since August 1, 2025: cold paths charge initialization time the same as handler time
- Run with two load balancer ARNs to compare , connection logs, and cross-zone settings side by side—read-only AWS CLI calls, AWS CLI v2
- 25+

Table of Contents
On May 8, 2026, the operational default for public HTTP APIs on AWS is still Application Load Balancer (ALB) terminating TLS and routing by host and path—while high-volume TCP workloads stay on Network Load Balancer (NLB). That split is not “old news”: it decides where HTTP/2 features, sticky sessions, and AWS WAF attach. For containers on EKS, our n8n-on-EKS production guide is a concrete L7 ingress story you can contrast with raw NLB patterns.
INIT-phase billing for AWS Lambda has been live since August 1, 2025: cold paths charge initialization time the same as handler time. Combined with 29-second maximum integration timeout on Amazon API Gateway REST APIs (a hard ceiling everyone hits eventually on slow downstreams), scale conversations are now as much about billing physics as about EC2 instance families.
Reproduce this — Clone the companion scripts in the FactualMinds repo:
examples/architecture-blog-2026/ingress-and-scale/(Bitbucketmainafter merge). Runcheck-alb-nlb-attributes.shwith two load balancer ARNs to compareidle_timeout, connection logs, and cross-zone settings side by side—read-only AWS CLI calls, AWS CLI v2.25+.
L4 vs L7: what each load balancer optimizes
NLB (Layer 4) forwards TCP/UDP with minimal manipulation. Benefits: extreme performance, static IP/prefix options, long-lived connection friendliness. Costs: no native HTTP routing, host-header rules, or WAF on the listener the way ALB exposes.
ALB (Layer 7) understands HTTP. Benefits: path routing, Lambda and IP targets, AWS WAF integration, gRPC on ALB where supported. Costs: slightly higher latency than NLB for raw TCP passthrough workloads and a different pricing model for LCU consumption.
Opinionated take — For customer-facing REST/JSON behind a domain name, default to ALB unless you have a measured L4 reason. NLB-in-front-of-ALB stacks are useful when a firewall partner demands fixed IPs or you must terminate non-HTTP protocols; they are not a free “performance hack” for standard JSON APIs.
Horizontal vs vertical scaling on AWS
Vertical scaling (bigger m7g.4xlarge, more EBS throughput) reduces coordination overhead when the workload is single-threaded or license-bound.
Horizontal scaling (more tasks, more Lambda concurrency, more EC2 ASG capacity) wins for request-parallel workloads—if data and session affinity do not serialize you.
Failure mode: horizontal scale amplifies noisy neighbors on shared databases. Pair compute scale with read replicas, caches, or DynamoDB partition discipline before you congratulate the ASG graph.
Read our Karpenter vs Cluster Autoscaler cost guide when horizontal scale meets Kubernetes—different bin-packing economics than raw ASG.
Cold starts: Lambda, INIT, and provisioned capacity
Two separate problems get lumped as “cold start”:
- INIT — import graph, SDK clients, dependency injection.
- Execution ramp — first requests after scale-to-zero before JVM CLR JIT warmth matters.
After Aug 1, 2025, INIT is billed. That moves “lazy imports” from a latency issue to a cost-per-deploy issue on bursty functions.
Mitigations worth sequencing (not all universal):
- Smaller deployment packages and lazy
require/ dynamicimport()in Node 22 runtimes. - SnapStart for supported Java runtimes where applicable.
- Provisioned Concurrency when latency SLOs fund the spare capacity (break-even math in our Lambda cost optimization guide).
- RDS Proxy when connection storms dominate INIT (see RDS performance practices for database-side tuning context).
What broke — A retail-traffic shaped workload moved Provisioned Concurrency from 50 → 200 without increasing RDS Proxy max connections. INIT time improved, but connection acquisition spikes during marketing events exhausted the database
max_connections—p95 improved for Lambda while error rate climbed on checkout. Fix: cap Lambda reserved concurrency + pool limits + queue absorption (SQS) instead of unconstrained parallel DB opens.
Peaky AI spend adds a related trap: see autoscaling AI workloads and budget overruns before autoscaling pipelines widen blast radius.
Hybrid compute reminder
Not every service belongs on Lambda. When sustained vCPU is cheaper on Graviton EC2 or Fargate, hybrid compute guidance keeps finance and engineering aligned.
What This Post Doesn’t Cover
- Gateway Load Balancer (GWLB) inspection topologies for centralized firewalls—different buyer question than app ingress.
- CloudFront as the true edge vs regional ALB—see CDN comparisons separately.
- Per-protocol HTTP/3 nuances on ALB—verify current Region/feature availability in AWS docs before promising.
If You Only Do One Thing
Instrument ALB target health and Lambda concurrent executions on the same dashboard as database connection counts. Scale events without those three curves invite theatrical postmortems.
What to Do This Week
- Export ALB/NLB attributes for production ingress with the companion script; file tickets for any
idle_timeoutunder your longest safe keep-alive path. - Confirm API Gateway (or ALB) timeouts ≤ downstream worst-case, with an explicit saga or async handoff before the 29s REST ceiling.
- Re-run Lambda memory/power tuning after the INIT billing change—stale 2024 baselines mis-price cold paths.
For correlated debugging once scale creates cross-service mysteries, continue with debugging distributed AWS systems.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.




