When should we choose NLB over ALB?

Choose NLB when you need ultra-low latency TCP/UDP forwarding, preserve client IP at Layer 4, TLS pass-through, or extreme connection churn (millions of connections) without HTTP feature requirements. Choose ALB when you need HTTP host/path routing, weighted targets, AWS WAF integration on the listener, or HTTP/HTTPS-only modern ingress. Mixing them (NLB in front of ALB) is an anti-pattern for most HTTP APIs—reserve that pattern for legacy IP pinning or partner whitelisting constraints.

When is vertical scaling the wrong first move on AWS?

When the bottleneck is coordination (database hot keys, single-threaded runtimes) or licensing per core. Bigger instances hide the problem for one deploy window but do not fix partition skew. Horizontal scale with proper sharding or queue-based absorption usually wins—after you prove the workload is embarrassingly parallel.

What goes wrong if we only right-size Lambda memory without touching concurrency?

You can shorten duration per invocation (memory also buys CPU) but still hit regional concurrency caps, subnet IP exhaustion on ENI-heavy functions, or downstream throttling (RDS max_connections, partner APIs). Right-sizing without concurrency plus downstream capacity planning replaces one bottleneck with another.

Does Provisioned Concurrency eliminate cold starts?

It eliminates init latency for pre-warmed execution environments on the provisioned slice, not for burst above that slice, runtimes without SnapStart benefits, or mis-sized VPC-attached functions. It is a bill line item—evaluate break-even with pay-per-request using load tests, not slides.

What is a common ALB + ECS mistake during traffic spikes?

Health check thresholds too aggressive combined with slow-start on targets: ALB marks new tasks unhealthy before readiness completes, flapping replaces healthy capacity with connection resets. Tune health checks to match real startup time and align target group deregistration delay with graceful connection drain.

Why does horizontal scale sometimes increase cost faster than vertical scale?

Because per-instance license fees, data transfer cross-AZ fan-out, and log volume scale with node count. FinOps visibility matters—see autoscaling budget patterns before you celebrate linear CPU charts.

AWS Load Balancing & Scale: L4 vs L7, Horizontal vs Vertical, Cold Starts

Ingress, Load Balancing, and Elastic Scale on AWS: L4 vs L7, Horizontal vs Vertical, and the Cold-Start Bill

Quick summary: As of May 8, 2026, Lambda bills INIT time on cold paths (pricing change live since Aug 1, 2025), API Gateway REST integrations time out at 29 seconds, and picking ALB vs NLB still determines whether TLS termination and routing live on the edge.

Key Takeaways

That split is not “old news”: it decides where HTTP/2 features, sticky sessions, and AWS WAF attach
For containers on EKS, our n8n-on-EKS production guide is a concrete L7 ingress story you can contrast with raw NLB patterns
INIT-phase billing for AWS Lambda has been live since August 1, 2025: cold paths charge initialization time the same as handler time
Run with two load balancer ARNs to compare , connection logs, and cross-zone settings side by side—read-only AWS CLI calls, AWS CLI v2
25+

On May 8, 2026, the operational default for public HTTP APIs on AWS is still Application Load Balancer (ALB) terminating TLS and routing by host and path—while high-volume TCP workloads stay on Network Load Balancer (NLB). That split is not “old news”: it decides where HTTP/2 features, sticky sessions, and AWS WAF attach. For containers on EKS, our n8n-on-EKS production guide is a concrete L7 ingress story you can contrast with raw NLB patterns.

INIT-phase billing for AWS Lambda has been live since August 1, 2025: cold paths charge initialization time the same as handler time. Combined with 29-second maximum integration timeout on Amazon API Gateway REST APIs (a hard ceiling everyone hits eventually on slow downstreams), scale conversations are now as much about billing physics as about EC2 instance families.

Reproduce this — Clone the companion scripts in the FactualMinds repo: examples/architecture-blog-2026/ingress-and-scale/ (Bitbucket main after merge). Run check-alb-nlb-attributes.sh with two load balancer ARNs to compare idle_timeout, connection logs, and cross-zone settings side by side—read-only AWS CLI calls, AWS CLI v2.25+.

L4 vs L7: what each load balancer optimizes

NLB (Layer 4) forwards TCP/UDP with minimal manipulation. Benefits: extreme performance, static IP/prefix options, long-lived connection friendliness. Costs: no native HTTP routing, host-header rules, or WAF on the listener the way ALB exposes.

ALB (Layer 7) understands HTTP. Benefits: path routing, Lambda and IP targets, AWS WAF integration, gRPC on ALB where supported. Costs: slightly higher latency than NLB for raw TCP passthrough workloads and a different pricing model for LCU consumption.

Opinionated take — For customer-facing REST/JSON behind a domain name, default to ALB unless you have a measured L4 reason. NLB-in-front-of-ALB stacks are useful when a firewall partner demands fixed IPs or you must terminate non-HTTP protocols; they are not a free “performance hack” for standard JSON APIs.

Horizontal vs vertical scaling on AWS

Vertical scaling (bigger m7g.4xlarge, more EBS throughput) reduces coordination overhead when the workload is single-threaded or license-bound.

Horizontal scaling (more tasks, more Lambda concurrency, more EC2 ASG capacity) wins for request-parallel workloads—if data and session affinity do not serialize you.

Failure mode: horizontal scale amplifies noisy neighbors on shared databases. Pair compute scale with read replicas, caches, or DynamoDB partition discipline before you congratulate the ASG graph.

Read our Karpenter vs Cluster Autoscaler cost guide when horizontal scale meets Kubernetes—different bin-packing economics than raw ASG.

Cold starts: Lambda, INIT, and provisioned capacity

Two separate problems get lumped as “cold start”:

INIT — import graph, SDK clients, dependency injection.
Execution ramp — first requests after scale-to-zero before JVM CLR JIT warmth matters.

After Aug 1, 2025, INIT is billed. That moves “lazy imports” from a latency issue to a cost-per-deploy issue on bursty functions.

Mitigations worth sequencing (not all universal):

Smaller deployment packages and lazy require / dynamic import() in Node 22 runtimes.
SnapStart for supported Java runtimes where applicable.
Provisioned Concurrency when latency SLOs fund the spare capacity (break-even math in our Lambda cost optimization guide).
RDS Proxy when connection storms dominate INIT (see RDS performance practices for database-side tuning context).

What broke — A retail-traffic shaped workload moved Provisioned Concurrency from 50 → 200 without increasing RDS Proxy max connections. INIT time improved, but connection acquisition spikes during marketing events exhausted the database max_connections—p95 improved for Lambda while error rate climbed on checkout. Fix: cap Lambda reserved concurrency + pool limits + queue absorption (SQS) instead of unconstrained parallel DB opens.

Peaky AI spend adds a related trap: see autoscaling AI workloads and budget overruns before autoscaling pipelines widen blast radius.

Hybrid compute reminder

Not every service belongs on Lambda. When sustained vCPU is cheaper on Graviton EC2 or Fargate, hybrid compute guidance keeps finance and engineering aligned.

What This Post Doesn’t Cover

Gateway Load Balancer (GWLB) inspection topologies for centralized firewalls—different buyer question than app ingress.
CloudFront as the true edge vs regional ALB—see CDN comparisons separately.
Per-protocol HTTP/3 nuances on ALB—verify current Region/feature availability in AWS docs before promising.

If You Only Do One Thing

Instrument ALB target health and Lambda concurrent executions on the same dashboard as database connection counts. Scale events without those three curves invite theatrical postmortems.

What to Do This Week

Export ALB/NLB attributes for production ingress with the companion script; file tickets for any idle_timeout under your longest safe keep-alive path.
Confirm API Gateway (or ALB) timeouts ≤ downstream worst-case, with an explicit saga or async handoff before the 29s REST ceiling.
Re-run Lambda memory/power tuning after the INIT billing change—stale 2024 baselines mis-price cold paths.

For correlated debugging once scale creates cross-service mysteries, continue with debugging distributed AWS systems.

Ingress, Load Balancing, and Elastic Scale on AWS: L4 vs L7, Horizontal vs Vertical, and the Cold-Start Bill

L4 vs L7: what each load balancer optimizes

Horizontal vs vertical scaling on AWS

Cold starts: Lambda, INIT, and provisioned capacity

Hybrid compute reminder

What This Post Doesn’t Cover

If You Only Do One Thing

What to Do This Week

Related AWS Services

AWS Architecture Review

AWS Serverless

AWS Migration

Ready to discuss your AWS strategy?

Recommended Reading

Production Resilience on AWS: Timeouts, Retries With Jitter, Circuit Limits, and Graceful Shutdown

Event-Driven Boundaries on AWS: Async vs Sync, Amazon MSK vs Amazon MQ (RabbitMQ), and When SQS Wins

Microservices Design Patterns on AWS: 10 Patterns That Actually Matter in 2026

AWS IoT Solutions: Architecture Patterns for Connected Devices

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

L4 vs L7: what each load balancer optimizes

Horizontal vs vertical scaling on AWS

Cold starts: Lambda, INIT, and provisioned capacity

Hybrid compute reminder

What This Post Doesn’t Cover

If You Only Do One Thing

What to Do This Week

Related AWS Services

AWS Architecture Review

AWS Serverless

AWS Migration

Ready to discuss your AWS strategy?

Recommended Reading

Production Resilience on AWS: Timeouts, Retries With Jitter, Circuit Limits, and Graceful Shutdown

Event-Driven Boundaries on AWS: Async vs Sync, Amazon MSK vs Amazon MQ (RabbitMQ), and When SQS Wins

Microservices Design Patterns on AWS: 10 Patterns That Actually Matter in 2026

AWS IoT Solutions: Architecture Patterns for Connected Devices