How to Choose Between Nginx, FrankenPHP, and Modern Web Runtimes (2026)
Quick summary: FrankenPHP, Nginx+PHP-FPM, Node.js, Python Gunicorn+uvicorn, and Go each have different memory profiles, concurrency models, and failure modes. The right choice depends on your workload, not benchmarks.

Table of Contents
The web runtime landscape in 2026 looks nothing like it did five years ago. FrankenPHP went stable. Bun disrupted Node.js performance assumptions. Python’s asyncio ecosystem matured enough to be a first-class choice for high-throughput APIs. Go HTTP servers handle millions of requests per day on containers that cost less than a cup of coffee.
Yet the question teams get wrong is not “which runtime is fastest?” Benchmarks are synthetic. The question is “which runtime fits your team’s skills, your workload’s characteristics, and your AWS container budget?” Those are different questions with different answers.
This guide covers each major runtime’s concurrency model, memory profile, deployment patterns on ECS, and the failure modes that cause production incidents.
Runtime Landscape: Numbers That Matter for ECS Decisions
Before examining each runtime in depth, here is the comparison table built on workload-representative numbers — not Hello World benchmarks:
| Runtime | Base Memory | RPS per 256 MB container | p99 latency (I/O-bound) | p99 latency (CPU-bound) | HTTP/3 support | Concurrency model |
|---|---|---|---|---|---|---|
| Go net/http | 10–30 MB | 2,000–5,000 | 5–15ms | 10–30ms | Via quic-go | Goroutines (M:N threads) |
| Node.js (Express) | 80–150 MB | 500–1,500 | 5–20ms | 50–200ms* | Via Node 22+ | Single-threaded event loop |
| Node.js (Fastify) | 60–100 MB | 800–2,000 | 4–15ms | 50–200ms* | Via Node 22+ | Single-threaded event loop |
| Python Gunicorn+uvicorn | 50–100 MB/worker | 200–800 | 10–30ms | 30–100ms | No (external proxy) | Multi-process + async I/O |
| Nginx + PHP-FPM | 30–50 MB/worker | 200–600 | 20–50ms | 30–80ms | Nginx via patch/module | Multi-process |
| FrankenPHP (worker mode) | 50–80 MB/worker | 400–1,200 | 10–30ms | 20–60ms | Yes (Caddy built-in) | Multi-process + persistent |
| FrankenPHP (standard mode) | 30–50 MB/worker | 200–600 | 20–50ms | 30–80ms | Yes (Caddy built-in) | Multi-process |
*Node.js CPU-bound performance degrades sharply due to single-threaded event loop. CPU-intensive work blocks all other requests.
Columns explained:
- Base memory: memory at startup before handling requests. Determines minimum container size.
- RPS per 256 MB container: throughput in a 256 MB ECS Fargate task for typical JSON API workloads. Your numbers will vary.
- p99 latency: 99th percentile latency for I/O-bound requests (database queries, external API calls). CPU-bound numbers are for moderate compute tasks.
Nginx + PHP-FPM: The Proven Standard
Nginx + PHP-FPM remains the most common PHP deployment pattern and for good reason: it is battle-tested, well-documented, and straightforward to debug.
The architecture: Nginx receives HTTP requests and proxies them to PHP-FPM via FastCGI (Unix socket or TCP). PHP-FPM manages a pool of worker processes. Each worker handles one request at a time and boots the PHP application from scratch for each request.
The bootstrap cost is the defining characteristic of PHP-FPM. For a basic PHP script, this is negligible. For a full Laravel application with 50+ service providers, the bootstrap cost is 10–40ms per request — before your controller logic runs.
PHP-FPM Pool Sizing for ECS
The formula is straightforward: pm.max_children = (available_memory - overhead) / memory_per_worker
Measure memory per worker in a staging environment that mirrors production:
# Check PHP-FPM worker memory on a running container
ps aux | grep php-fpm | grep -v grep | awk '{print $6}' | sort -n
# Or enable the status endpoint in www.conf:
# pm.status_path = /status
# Then: curl http://localhost/status?full | grep "memory usage"For a Laravel application with a typical feature set: 20–40 MB per worker under load. ORM relationships loaded, session data populated, a few eager-loaded models in memory.
For a 512 MB Fargate task with ~100 MB overhead (OS, Nginx, PHP-FPM master):
- Available: 412 MB
- Per worker: 35 MB (typical)
- max_children: 11 workers (with ~7 MB headroom per worker)
Set it to 10 to leave headroom for spikes.
; PHP-FPM www.conf optimized for ECS Fargate
[www]
user = www-data
group = www-data
listen = /var/run/php-fpm.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
; Dynamic PM adjusts active workers to load
; Appropriate for ECS where containers scale out
pm = dynamic
; Set to: (available_memory - OS_overhead) / memory_per_worker
; Measure memory_per_worker with: ps aux | grep php-fpm
pm.max_children = 10
; Start with 25% of max_children
pm.start_servers = 3
; Minimum idle workers to keep alive
pm.min_spare_servers = 2
; Maximum idle workers before killing extras
pm.max_spare_servers = 7
; Restart worker after N requests to clear memory leaks
; Set lower (100-500) if you observe memory growth
pm.max_requests = 500
; Log slow requests for debugging
slowlog = /var/log/php-fpm-slow.log
request_slowlog_timeout = 5s
; Terminate requests exceeding this time
; Must be less than Nginx proxy_read_timeout
request_terminate_timeout = 60s
; Enable status page for monitoring
pm.status_path = /fpm-status
ping.path = /fpm-pingNginx Configuration with Keepalive and Health Checks
upstream php-fpm {
server unix:/var/run/php-fpm.sock;
# For TCP connections to PHP-FPM (separate container):
# server php-fpm:9000;
# keepalive 16; # Maintain 16 persistent connections per worker
}
server {
listen 80;
server_name _;
root /var/www/html/public;
index index.php;
# Security: prevent Slowloris
client_header_timeout 10s;
client_body_timeout 10s;
send_timeout 10s;
keepalive_timeout 65s;
# Connection rate limiting per IP
limit_conn_zone $binary_remote_addr zone=per_ip:10m;
limit_conn per_ip 100;
# Health check endpoint — bypass PHP-FPM for ECS health checks
location = /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
# PHP-FPM health check
location = /fpm-ping {
fastcgi_pass php-fpm;
fastcgi_param SCRIPT_FILENAME /var/www/html/public/index.php;
include fastcgi_params;
access_log off;
}
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location ~ \.php$ {
try_files $uri =404;
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass php-fpm;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param PATH_INFO $fastcgi_path_info;
include fastcgi_params;
# Timeout must exceed PHP max_execution_time
fastcgi_read_timeout 60s;
# Buffer responses to free PHP-FPM workers faster
fastcgi_buffering on;
fastcgi_buffer_size 16k;
fastcgi_buffers 16 16k;
}
# Serve static assets directly without PHP-FPM
location ~* \.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$ {
expires 1y;
add_header Cache-Control "public, immutable";
access_log off;
}
# Block access to sensitive files
location ~ /\. {
deny all;
}
}The fastcgi_buffering on setting is important for ECS cost: it buffers the PHP-FPM response and sends it to the client while PHP-FPM has already completed. Without buffering, PHP-FPM workers wait for the client to receive the full response before the worker is freed — slow clients hold workers longer, reducing effective capacity.
FrankenPHP: PHP in a New Shape
FrankenPHP is a PHP application server built on top of Caddy (written in Go), embedding PHP via CGO. It ships as a single binary and handles HTTP, HTTPS, HTTP/3, and PHP execution in one process.
What FrankenPHP Actually Changes
In standard mode, FrankenPHP behaves similarly to Nginx + PHP-FPM: each request boots the application, runs the handler, tears down. You get Caddy’s HTTP/3 and automatic HTTPS, but the PHP execution model is unchanged.
In worker mode, FrankenPHP boots the application once and keeps it in memory between requests. This is equivalent to Laravel Octane with Swoole or RoadRunner, but built into the server binary. The application bootstrap runs once; subsequent requests skip it entirely.
Worker mode performance impact on a typical Laravel application:
- Standard mode: 20–50ms per request (10–40ms bootstrap + 10ms controller)
- Worker mode: 5–15ms per request (0ms bootstrap + 5–15ms controller)
That is a 2–4× throughput improvement for the same compute, or 50–75% fewer ECS tasks for the same request volume.
FrankenPHP Dockerfile for Production
FROM dunglas/frankenphp:1-php8.3-alpine AS production
# Install PHP extensions
RUN install-php-extensions \
pdo_pgsql \
pdo_mysql \
redis \
opcache \
intl \
zip \
bcmath
# Copy application
WORKDIR /app
COPY --chown=www-data:www-data . .
# Install Composer dependencies
COPY --from=composer:2 /usr/bin/composer /usr/bin/composer
RUN composer install \
--no-dev \
--no-interaction \
--optimize-autoloader \
--prefer-dist
# Laravel optimizations
RUN php artisan config:cache && \
php artisan route:cache && \
php artisan view:cache && \
php artisan event:cache
# PHP configuration
COPY docker/php/opcache.ini /usr/local/etc/php/conf.d/opcache.ini
COPY docker/php/production.ini /usr/local/etc/php/conf.d/production.ini
# Caddy configuration with FrankenPHP worker mode
COPY docker/Caddyfile /etc/caddy/Caddyfile
EXPOSE 80 443 443/udp
# Run as non-root
USER www-data
CMD ["frankenphp", "run", "--config", "/etc/caddy/Caddyfile"]# /docker/Caddyfile
{
# Disable admin API in production
admin off
# FrankenPHP worker mode
frankenphp {
# Number of PHP workers
# Formula: available_memory / memory_per_worker
# Leave headroom: set to 80% of max safe value
num_threads 8
worker {
file /app/public/index.php
num 8
# Restart worker after N requests to prevent memory leaks
max_requests 500
}
}
}
:80 {
# Health check — no PHP needed
respond /health 200
root * /app/public
# Enable compression
encode zstd gzip
php_server {
# Worker mode enabled in global config above
}
# Serve static files before PHP
file_server
}; /docker/php/opcache.ini
opcache.enable=1
opcache.enable_cli=1
opcache.memory_consumption=256
opcache.interned_strings_buffer=16
opcache.max_accelerated_files=20000
; In worker mode, files do not change — disable revalidation
opcache.validate_timestamps=0
opcache.save_comments=1
opcache.fast_shutdown=1FrankenPHP Production Considerations
Stateless requirement: worker mode keeps the application in memory between requests. Laravel applications typically handle this well because the framework is designed to be stateless — but check your custom service providers. Anything that stores state in static properties or in the service container that accumulates across requests is a memory leak in worker mode.
Memory leak management: max_requests 500 in the Caddyfile restarts workers after 500 requests. This is the safety valve for memory leaks. Monitor worker memory over time with ps aux | grep frankenphp. If memory grows linearly with requests, you have a leak — reduce max_requests or find and fix the leak.
Container image size: FrankenPHP embeds Caddy (Go binary) and links PHP via CGO. The alpine-based image is ~200 MB vs ~100 MB for a minimal Nginx + PHP-FPM alpine setup. This is generally acceptable; ECR pull times are the more relevant consideration.
Node.js: The Event Loop Model
Node.js uses a single-threaded event loop for JavaScript execution. I/O operations (database queries, HTTP calls, file reads) are non-blocking — Node.js hands them off to the OS and continues processing other callbacks while waiting. This is why Node.js handles high-concurrency I/O workloads well on minimal resources.
The limitation: CPU-intensive operations block the event loop. A tight loop, large JSON parsing, or synchronous crypto work prevents Node.js from processing any other requests until it completes. A 200ms CPU-intensive operation means every other request waits 200ms.
For API-heavy applications dominated by database queries and external HTTP calls, Node.js is highly efficient. For mixed-use applications where some endpoints are CPU-intensive, offload those to worker threads.
Node.js Cluster with Graceful Shutdown
Single Node.js processes use one CPU core. The cluster module spawns worker processes, each with their own event loop, sharing a TCP socket. Each worker handles requests independently.
import cluster from 'node:cluster';
import http from 'node:http';
import os from 'node:os';
import process from 'node:process';
const numWorkers = process.env.NODE_WORKERS
? parseInt(process.env.NODE_WORKERS, 10)
: os.availableParallelism();
if (cluster.isPrimary) {
console.log(`Primary ${process.pid}: spawning ${numWorkers} workers`);
for (let i = 0; i < numWorkers; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
if (!worker.exitedAfterDisconnect) {
console.log(`Worker ${worker.process.pid} died (${signal || code}) — replacing`);
cluster.fork();
}
});
// Graceful shutdown: disconnect all workers on SIGTERM
process.on('SIGTERM', () => {
console.log('Primary: SIGTERM received, shutting down workers');
for (const worker of Object.values(cluster.workers ?? {})) {
worker.send('shutdown');
worker.disconnect();
}
setTimeout(() => process.exit(0), 10000); // Force exit after 10s
});
} else {
// Worker process
const server = http.createServer((req, res) => {
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ status: 'ok', pid: process.pid }));
});
server.listen(3000, () => {
console.log(`Worker ${process.pid}: listening on :3000`);
});
// Graceful shutdown in worker
process.on('message', (msg) => {
if (msg === 'shutdown') {
server.close(() => {
console.log(`Worker ${process.pid}: closed`);
process.exit(0);
});
}
});
// Handle SIGTERM directly if no primary message arrives
process.on('SIGTERM', () => {
server.close(() => {
process.exit(0);
});
});
}For ECS deployments, set NODE_WORKERS to the vCPU count of your Fargate task. A 0.5 vCPU task: 1 worker. A 1 vCPU task: 1–2 workers. A 2 vCPU task: 2 workers. Node.js cluster workers beyond the vCPU count do not improve throughput — they create context-switching overhead.
Memory Sizing for Node.js ECS Tasks
A Node.js Express application base memory: 80–120 MB for the runtime, imported modules, and framework overhead. Under load with active request state, expect 150–250 MB.
For ECS Fargate, the minimum practical task size for Node.js is 512 MB memory with 0.25 vCPU. For production API services, 1 GB memory with 0.5–1 vCPU is a common starting point.
Use the --max-old-space-size V8 flag to set a hard memory limit and trigger garbage collection more aggressively before the process is OOM-killed by ECS:
node --max-old-space-size=768 dist/server.jsSet this to 75% of the container memory limit. For a 1 GB container: --max-old-space-size=768.
Python: Gunicorn + uvicorn for Production APIs
Python’s concurrency story is more complex than the other runtimes because of the GIL (Global Interpreter Lock) — CPython can only execute Python bytecode on one thread at a time, regardless of CPU count.
For I/O-bound async workloads: asyncio (and its faster event loop, uvloop) bypasses the GIL limitation because I/O operations release the GIL while waiting. An async FastAPI or Django Channels application can handle thousands of concurrent I/O operations per worker.
For CPU-bound workloads: multiple processes are required. Each process has its own Python interpreter and GIL. Gunicorn spawns multiple worker processes, giving you true parallelism for CPU-bound work.
For production FastAPI or Django deployments on ECS, the standard configuration is Gunicorn with uvicorn workers:
gunicorn app.main:app \
--workers $(( 2 * $(nproc) + 1 )) \
--worker-class uvicorn.workers.UvicornWorker \
--worker-connections 1000 \
--max-requests 1000 \
--max-requests-jitter 50 \
--timeout 30 \
--graceful-timeout 30 \
--bind 0.0.0.0:8000 \
--access-logfile - \
--error-logfile -Worker formula: 2 × CPU_count + 1 is the Gunicorn recommendation for I/O-bound workloads. For a 1 vCPU Fargate task: 3 workers. For 2 vCPU: 5 workers.
--max-requests 1000 --max-requests-jitter 50: workers restart after 1,000 requests (±50 for jitter to prevent simultaneous restarts). This prevents memory leaks in long-running workers. The jitter prevents all workers from restarting simultaneously and temporarily halving capacity.
Memory impact: each Gunicorn worker is a forked Python process. For a FastAPI app: 80–150 MB per worker after fork. A 1 GB Fargate task with 3 workers: 240–450 MB for workers + 100 MB overhead = 340–550 MB. A 1 GB task can run 3–5 Gunicorn workers comfortably.
Go: The Efficient Alternative
Go’s net/http standard library handles concurrency via goroutines — lightweight threads (8 KB initial stack, growing as needed) scheduled by the Go runtime’s M:N scheduler. The runtime multiplexes goroutines onto OS threads, typically GOMAXPROCS threads (default: number of CPU cores).
For an HTTP server, each request gets its own goroutine. The goroutine blocks on I/O (database query, HTTP call) but the Go scheduler immediately switches to another goroutine. No event loop, no callback hell, no async/await.
The practical implication: a Go HTTP server can handle thousands of concurrent requests with minimal memory. A single goroutine for a blocked I/O operation uses 8–64 KB, compared to a full OS thread (1–8 MB stack). This is why Go services typically use far less memory than equivalent PHP, Python, or Node.js services for the same concurrency level.
For ECS cost, this translates directly: a 256 MB Fargate task running a Go API service can handle 2,000–5,000 requests/second for typical JSON API workloads. An equivalent PHP-FPM setup with the same concurrency requirements would need 1–4 GB.
Go’s tradeoff: the development ecosystem is smaller than Node.js or Python, the compile step adds to CI/CD time, and goroutine-based concurrency bugs (race conditions, deadlocks) can be harder to debug than Python or Node.js async bugs.
HTTP/3 Support Matrix
HTTP/3 uses QUIC (UDP-based transport) instead of TCP. For clients on mobile networks or unreliable connections, HTTP/3’s connection migration and reduced handshake time measurably improve perceived performance.
| Runtime | HTTP/3 support | Notes |
|---|---|---|
| Go | Via quic-go library | Third-party, production-ready |
| FrankenPHP | Yes, native | Caddy built-in, zero configuration |
| Node.js | Via Node 22+ experimental | Not production-recommended yet |
| Nginx | Via nginx-quic patch or Nginx Plus | Requires specific build |
| Python (uvicorn) | Via hypercorn | Third-party, production-ready |
| PHP-FPM | Proxy only (Nginx or Caddy in front) | PHP-FPM itself does not handle HTTP/3 |
For most AWS deployments, CloudFront handles HTTP/3 termination at the CDN layer, and your ECS service communicates with CloudFront over HTTP/1.1 or HTTP/2 via ALB. In this case, HTTP/3 support in the runtime itself is irrelevant — CloudFront handles it.
If you are terminating HTTP directly at the ECS service (without CloudFront), FrankenPHP and Go with quic-go are the practical choices for HTTP/3 in 2026.
Security: Slowloris and Connection Exhaustion
Every web runtime is vulnerable to connection exhaustion attacks if not configured defensively. The Slowloris attack — sending partial HTTP headers slowly to keep connections open — is the classic example.
Nginx defensive configuration:
# In http { } or server { } context
# Timeout for reading request headers (default: 60s — too long)
client_header_timeout 5s;
# Timeout for reading request body
client_body_timeout 10s;
# Timeout for transmitting response to client
send_timeout 10s;
# Maximum time a keep-alive connection sits idle
keepalive_timeout 30s;
# Maximum number of keep-alive requests per connection
keepalive_requests 100;
# Limit simultaneous connections per IP
limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn addr 50;
# Limit request rate per IP (for POST-heavy endpoints)
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/m;File descriptor limits: Each open connection consumes a file descriptor. The default OS limit is often 1,024 or 65,536. For Nginx workers with high connection counts, configure:
# nginx.conf
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 8192;
# Total connections = worker_processes × worker_connections
# Ensure: worker_rlimit_nofile >= worker_connections × 2
}For ECS containers, set the ulimits in your task definition:
{
"ulimits": [
{
"name": "nofile",
"softLimit": 65535,
"hardLimit": 65535
}
]
}Without this, your container uses the system default (1,024 open files), and a burst of connections exhausts it long before Nginx reaches its worker_connections limit.
AWS WAF as the first line of defense: Before configuring Slowloris mitigation at the Nginx level, attach AWS WAF to your ALB with a rate-based rule. WAF blocks IPs exceeding your defined request rate before traffic reaches your ECS tasks. This is the cost-effective approach: WAF blocks the attack at the edge, preventing the EC2/Fargate compute from spending CPU on attack traffic.
Choosing: The Decision Framework
Use Nginx + PHP-FPM if:
- Your team knows PHP and Laravel/Symfony.
- You want maximum operational familiarity.
- Benchmark performance meets your SLA requirements (it usually does).
- You do not need HTTP/3 or long-running worker mode.
Use FrankenPHP if:
- You are already on PHP/Laravel and want worker mode performance.
- You want HTTP/3 without additional infrastructure.
- You are comfortable ensuring application statelessness.
- The 2–4× performance improvement changes your ECS cost model materially.
Use Node.js if:
- Your team is primarily JavaScript.
- The workload is I/O-bound (API gateway, webhook handler, BFF layer).
- You value the npm ecosystem for rapid feature development.
- CPU-intensive work is handled in worker threads or separate services.
Use Python (Gunicorn + uvicorn) if:
- You are building ML-adjacent APIs (model inference, data processing).
- Your team writes Python.
- Django or FastAPI is your framework.
- The GIL is not a bottleneck (I/O-bound workloads, or CPU offloaded to workers).
Use Go if:
- Memory efficiency is a primary concern (container density, Fargate cost).
- The workload is CPU-bound or requires high concurrency.
- Your team can invest in Go’s learning curve.
- You are building infrastructure-adjacent services (proxies, gateways, agents).
The runtime decision has a smaller impact on your total system cost than the architecture decisions around it: caching strategy, database query efficiency, connection pooling, and horizontal scaling thresholds. Pick the runtime your team can operate reliably, instrument it thoroughly, and optimize the architecture.
Related reading: AWS ECS vs EKS: Container Orchestration Decision Guide and AWS Auto Scaling Strategies for EC2, ECS, and Lambda.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.



