---
title: How to Tune PHP, Node.js, Python, and Go for High Concurrency on AWS
description: PHP-FPM, Node.js, Python, and Go have fundamentally different concurrency models. Tuning each runtime for high concurrency on ECS requires understanding the model, not just copying configuration values from Stack Overflow.
url: https://www.factualminds.com/blog/tune-php-node-python-go-high-concurrency/
datePublished: 2026-03-29T00:00:00.000Z
dateModified: 2026-04-16T00:00:00.000Z
author: palaniappan-p
category: Cloud Architecture
tags: how-to-guide, php, nodejs, python, go, concurrency, performance, aws, ecs, octane, gunicorn, goroutines
---

# How to Tune PHP, Node.js, Python, and Go for High Concurrency on AWS

> PHP-FPM, Node.js, Python, and Go have fundamentally different concurrency models. Tuning each runtime for high concurrency on ECS requires understanding the model, not just copying configuration values from Stack Overflow.

Every engineering team that "tunes for performance" eventually lands in the same place: they copy configuration values from a Medium article written for a different workload on different hardware, deploy to ECS, and wonder why their p99 latency did not improve.

Effective tuning requires understanding the concurrency model of your runtime. PHP processes requests sequentially per worker. Node.js multiplexes thousands of connections on a single thread via an event loop. Python's GIL gates CPU parallelism. Go spawns goroutines so cheaply that it can handle millions of concurrent connections on a laptop.

Each model has different bottlenecks, different failure modes, and different configuration levers. Here is what actually matters for each one on AWS ECS.

## Concurrency Models Compared

Understanding the model before touching any configuration:

| Runtime             | Concurrency mechanism                       | Parallelism                         | Key bottleneck                         |
| ------------------- | ------------------------------------------- | ----------------------------------- | -------------------------------------- |
| **PHP-FPM**         | Multiple OS processes                       | True parallelism up to worker count | Worker pool exhaustion                 |
| **PHP + Octane**    | Persistent workers, multiple processes      | True parallelism up to worker count | Memory leaks, static state             |
| **Node.js**         | Single-threaded event loop                  | No parallelism (CPU-bound)          | Event loop blocking                    |
| **Node.js cluster** | Multiple OS processes, each with event loop | True parallelism up to worker count | Memory, process coordination           |
| **Python asyncio**  | Cooperative coroutines, single-threaded     | No parallelism (GIL)                | GIL for CPU work, async I/O ceiling    |
| **Python Gunicorn** | Multiple OS processes                       | True parallelism up to worker count | Worker pool exhaustion, GIL per worker |
| **Go**              | Goroutines (M:N threads)                    | True parallelism up to GOMAXPROCS   | Goroutine leaks, sync.Mutex contention |

**Critical distinction:** parallelism (doing two things at the same time) vs concurrency (managing many things in progress simultaneously). Node.js is highly concurrent but not parallel for CPU work. Go is both concurrent and parallel.

For ECS, this means:

- PHP-FPM and Python Gunicorn: scale by adding workers (vertical) and tasks (horizontal).
- Node.js: scale by adding cluster workers (vertical, limited to CPU count) and tasks (horizontal).
- Go: scale by adding tasks (horizontal); each task handles massive concurrency internally.

## PHP and Laravel Octane

### Standard PHP-FPM Tuning

The PHP-FPM bottleneck is `pm.max_children`. When all workers are busy, new connections queue at Nginx (subject to `backlog` in the listen directive). When the queue fills, connections are refused.

OPcache eliminates the file system reads for every request — PHP bytecode is compiled once and stored in shared memory. Without OPcache, every PHP file in your application is read from disk and parsed on every request. With OPcache: read once, cached indefinitely (until invalidated).

OPcache tuning for production:

```ini
[opcache]
opcache.enable = 1
opcache.enable_cli = 1

; Shared memory for cached bytecode
; 256MB covers most applications
opcache.memory_consumption = 256

; Shared memory for interned strings (deduplicated)
opcache.interned_strings_buffer = 32

; Maximum number of files that can be cached
; Check current usage: opcache_get_status()['opcache_statistics']['num_cached_scripts']
opcache.max_accelerated_files = 20000

; In production: disable file timestamp checks (files do not change)
; Requires: opcache_reset() or container restart after deploy
opcache.validate_timestamps = 0

; JIT compilation (PHP 8+)
; opcache.jit_buffer_size = 100M
; opcache.jit = tracing
; Note: JIT helps CPU-bound code. For I/O-bound Laravel APIs, the benefit is minimal.
```

Set `opcache.validate_timestamps = 0` in production containers. When deploying a new version, you are replacing the container image — there is no need for PHP to check whether files changed. This eliminates a file system stat call per cached file per request.

Enable the OPcache status page during debugging to verify cache hit rate:

```php
<?php
$status = opcache_get_status();
echo "Hit rate: " . $status['opcache_statistics']['opcache_hit_rate'] . "%\n";
echo "Files cached: " . $status['opcache_statistics']['num_cached_scripts'] . "\n";
echo "Memory used: " . round($status['memory_usage']['used_memory'] / 1024 / 1024) . " MB\n";
```

A cache hit rate below 95% means either your file count exceeds `max_accelerated_files` or `validate_timestamps = 1` is causing unnecessary cache invalidation.

### Laravel Octane: What It Actually Does

Standard PHP-FPM bootstraps Laravel for every request:

1. Load `public/index.php`
2. Require `vendor/autoload.php` (autoloader)
3. Create Application container
4. Register all service providers (`boot()` and `register()` on 50+ providers)
5. Resolve HTTP kernel
6. Run middleware stack
7. Dispatch to router
8. Execute controller
9. Tear down request state

Steps 1–6 are bootstrap — identical for every request. For a typical Laravel application, this costs 10–40ms before your controller runs.

Laravel Octane with Swoole or RoadRunner executes steps 1–6 once and keeps the result in memory. Steps 7–9 run per request. Steps 1–6 are shared across all requests for the lifetime of the worker.

```php
<?php

// config/octane.php — production-tuned configuration
return [
    'server' => env('OCTANE_SERVER', 'swoole'),

    // Worker count formula: available_memory / memory_per_worker
    // Measure memory_per_worker with: memory_get_usage(true) in a controller
    // after a few requests have warmed the worker
    'workers' => env('OCTANE_WORKERS', 8),

    // Task workers for async dispatch (Octane::concurrently)
    'task_workers' => env('OCTANE_TASK_WORKERS', 4),

    // Restart each worker after N requests (prevents memory leak accumulation)
    // Start with 500; reduce if you observe memory growth
    'max_requests' => env('OCTANE_MAX_REQUESTS', 500),

    'swoole' => [
        'options' => [
            // Maximum open connections per worker
            'max_conn' => 1024,

            // Heartbeat timeout for idle connections
            'heartbeat_idle_time' => 60,
            'heartbeat_check_interval' => 30,

            // Enable coroutines for async operations within a request
            'hook_flags' => SWOOLE_HOOK_ALL,
        ],
    ],

    // Listeners that clean up state between requests
    // Add your custom cleanup here
    'listeners' => [
        WorkerStarting::class => [
            EnsureUploadedFilesAreValid::class,
        ],
        RequestReceived::class => [
            ...Octane::prepareApplicationForNextOperation(),
            ...Octane::prepareApplicationForNextRequest(),
        ],
        RequestHandled::class => [],
        RequestTerminated::class => [
            FlushTemporaryContainerInstances::class,
        ],
    ],

    // Services that are 'warm' (pre-resolved before first request)
    // Only include services that are safe to share across requests
    'warm' => [
        ...Octane::defaultServicesToWarm(),
    ],
];
```

### Memory Leak Patterns in Octane

The most common memory leak patterns:

**Static properties that accumulate state:**

```php
class EventDispatcher {
    // BAD: static array accumulates across requests
    private static array $listeners = [];

    public static function listen(string $event, callable $listener): void {
        static::$listeners[$event][] = $listener;
    }
}
```

**Container bindings not scoped to request:**

```php
// BAD: If this closure captures request-specific data, it leaks across requests
$this->app->bind(OrderProcessor::class, function () use ($request) {
    return new OrderProcessor($request->user());
});
```

**Global state in singletons:**

Octane's `RequestTerminated` listener calls `Octane::prepareApplicationForNextRequest()` which flushes singletons registered with `$app->instance()`. Custom singletons registered outside Laravel's container are not flushed automatically.

Detection: log memory usage at the start and end of each request in staging:

```php
class LogMemoryUsageMiddleware
{
    public function handle(Request $request, Closure $next): Response
    {
        $before = memory_get_usage(true);

        $response = $next($request);

        $after = memory_get_usage(true);
        $delta = $after - $before;

        if ($delta > 1024 * 1024) { // Log if request used more than 1 MB net
            logger()->warning('High memory delta', [
                'path' => $request->path(),
                'delta_bytes' => $delta,
                'after_mb' => round($after / 1024 / 1024, 2),
            ]);
        }

        return $response;
    }
}
```

If memory grows steadily across requests and does not return to baseline, you have a leak. `max_requests` is the safety valve — workers restart before leaks become critical.

## Node.js: Taming the Event Loop

### Event Loop Blocking Detection

The most common Node.js performance problem in production is event loop blocking that appears as high p99 latency with moderate CPU usage. The event loop is blocked, preventing I/O callbacks from running, but the CPU is not pegged — it is just waiting on the synchronous code to finish.

**Production monitoring with PerformanceObserver:**

```javascript
import { PerformanceObserver, performance } from 'node:perf_hooks';

let eventLoopLag = 0;
const SAMPLE_INTERVAL_MS = 100;

function measureEventLoopLag() {
  const start = performance.now();

  setImmediate(() => {
    // If the event loop is healthy, this runs ~immediately after setImmediate queues it
    // If blocked, this runs after the blocking code completes
    eventLoopLag = performance.now() - start - SAMPLE_INTERVAL_MS;
  });
}

// Sample every 100ms
setInterval(measureEventLoopLag, SAMPLE_INTERVAL_MS);

// Report the metric
setInterval(() => {
  if (eventLoopLag > 50) {
    console.warn(`Event loop lag: ${eventLoopLag.toFixed(1)}ms`);
    // Publish to CloudWatch via AWS SDK
    publishMetric('EventLoopLag', eventLoopLag);
  }
}, 5000);

function publishMetric(name, value) {
  // CloudWatch custom metric publishing
  const { CloudWatchClient, PutMetricDataCommand } = require('@aws-sdk/client-cloudwatch');
  const client = new CloudWatchClient({ region: process.env.AWS_REGION });
  client.send(
    new PutMetricDataCommand({
      Namespace: 'NodeJS/Runtime',
      MetricData: [
        {
          MetricName: name,
          Value: value,
          Unit: 'Milliseconds',
          Dimensions: [{ Name: 'ServiceName', Value: process.env.SERVICE_NAME ?? 'unknown' }],
        },
      ],
    })
  );
}
```

**V8 CPU profiling for production issues:**

```bash
# Start with profiling enabled
node --prof dist/server.js

# After incident, process the isolate-*.log file
node --prof-process isolate-0x*.log > profile.txt

# Top CPU consumers appear in the "JavaScript" section
grep -A 50 "\[JavaScript\]" profile.txt
```

For long-running investigations in production, CPU sampling via `clinic.js` or `0x` gives flamegraph output that makes blocking code immediately visible.

### Worker Threads for CPU-Intensive Operations

Offload CPU-intensive operations to worker threads to avoid blocking the main event loop:

```javascript
import { Worker, isMainThread, parentPort, workerData } from 'node:worker_threads';
import { fileURLToPath } from 'node:url';
import path from 'node:path';

const __filename = fileURLToPath(import.meta.url);

if (isMainThread) {
  /**
   * Run a CPU-intensive task in a worker thread.
   * The main event loop continues processing I/O while this runs.
   */
  export function runCpuTask(data) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename, {
        workerData: data,
      });

      worker.on('message', resolve);
      worker.on('error', reject);
      worker.on('exit', (code) => {
        if (code !== 0) {
          reject(new Error(`Worker stopped with exit code ${code}`));
        }
      });
    });
  }
} else {
  // Worker thread execution
  const result = performHeavyComputation(workerData);
  parentPort.postMessage(result);
}

function performHeavyComputation(data) {
  // CPU-intensive work here — runs in a separate OS thread
  // Does not block the main event loop
  let sum = 0;
  for (let i = 0; i < data.iterations; i++) {
    sum += Math.sqrt(i);
  }
  return sum;
}
```

For production, use a worker thread pool (via `piscina` or `workerpool`) rather than spawning a new thread per request:

```javascript
import Piscina from 'piscina';
import path from 'node:path';
import { fileURLToPath } from 'node:url';

const __dirname = path.dirname(fileURLToPath(import.meta.url));

// Thread pool: min 2, max (CPU count) worker threads
const pool = new Piscina({
  filename: path.resolve(__dirname, './workers/cpu-worker.js'),
  minThreads: 2,
  maxThreads: os.availableParallelism(),
  idleTimeout: 60000, // Retire idle threads after 60s
});

// In request handler
app.post('/process', async (req, res) => {
  const result = await pool.run({ data: req.body });
  res.json(result);
});
```

### Graceful Shutdown with In-Flight Request Draining

```javascript
import http from 'node:http';

const server = http.createServer(app);
let isShuttingDown = false;

server.listen(3000);

const gracefulShutdown = async (signal) => {
  if (isShuttingDown) return;
  isShuttingDown = true;

  console.log(`Received ${signal}, shutting down gracefully`);

  // Stop accepting new connections
  server.close(async () => {
    console.log('HTTP server closed');

    // Close database connections, flush caches, etc.
    await closeDbConnections();

    process.exit(0);
  });

  // Force shutdown after 30 seconds
  setTimeout(() => {
    console.error('Forcing shutdown after timeout');
    process.exit(1);
  }, 30000);
};

process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

// Refuse new requests during shutdown
app.use((req, res, next) => {
  if (isShuttingDown) {
    res.set('Connection', 'close');
    res.status(503).json({ error: 'Server shutting down' });
    return;
  }
  next();
});
```

ECS sends SIGTERM to the container task when stopping or deregistering from the load balancer. The 30-second `stopTimeout` in the task definition gives your application time to finish in-flight requests before ECS sends SIGKILL. Set `stopTimeout` to match your graceful shutdown timeout.

## Python: GIL, asyncio, and Gunicorn

### The GIL Impact in Practice

The GIL (Global Interpreter Lock) prevents multiple Python threads from executing Python bytecode simultaneously. One thread runs at a time. For I/O operations, the GIL is released while waiting — so asyncio and multi-threaded I/O both work efficiently. For CPU operations, the GIL means multi-threaded Python is effectively single-threaded.

Implications for Gunicorn:

- `--worker-class sync` (default): each worker is a single-threaded process. The GIL is irrelevant — each process has its own interpreter. True parallelism up to worker count.
- `--worker-class gthread`: each worker is multi-threaded. Multiple threads share one process, one GIL. I/O can be concurrent; CPU is constrained by the GIL.
- `--worker-class uvicorn.workers.UvicornWorker`: each worker runs an asyncio event loop. I/O concurrency within each worker; CPU constrained by GIL per worker.

For most FastAPI or Django applications on ECS, `uvicorn.workers.UvicornWorker` is the right choice: it combines Gunicorn's multi-process stability with uvicorn's async I/O efficiency.

### uvloop for Higher Event Loop Performance

uvloop is a faster asyncio event loop implemented in Cython and libuv. Drop-in replacement:

```python
import uvloop
import asyncio

# Install as default event loop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

# Or for FastAPI with uvicorn, pass --loop uvloop to uvicorn
# uvicorn app.main:app --loop uvloop
```

uvloop provides 2–4× throughput improvement for I/O-bound asyncio code in benchmarks. Real-world gains depend on the workload — applications spending most time in database queries see smaller improvements than those with many small I/O operations.

### Production uvicorn Command

```bash
gunicorn app.main:app \
    --bind 0.0.0.0:${PORT:-8000} \
    --workers ${GUNICORN_WORKERS:-$(( 2 * $(nproc) + 1 ))} \
    --worker-class uvicorn.workers.UvicornWorker \
    --worker-connections 1000 \
    --max-requests ${GUNICORN_MAX_REQUESTS:-1000} \
    --max-requests-jitter ${GUNICORN_MAX_REQUESTS_JITTER:-100} \
    --timeout ${GUNICORN_TIMEOUT:-30} \
    --graceful-timeout ${GUNICORN_GRACEFUL_TIMEOUT:-30} \
    --keep-alive ${GUNICORN_KEEPALIVE:-2} \
    --log-level ${LOG_LEVEL:-info} \
    --access-logfile - \
    --error-logfile - \
    --forwarded-allow-ips "*"
```

Expose the worker count and timeout as environment variables — this allows tuning per ECS task without rebuilding the image. In your ECS task definition, set `GUNICORN_WORKERS` based on the task's vCPU allocation.

### Celery Worker Types and GIL Impact

Celery supports multiple execution pool types. The right choice depends on your task workload:

```python
# For CPU-bound tasks (data processing, image handling, ML inference):
# prefork — multiple OS processes, each with own GIL
# celery -A app worker --pool=prefork --concurrency=4

# For I/O-bound tasks (HTTP calls, database queries, Redis ops):
# gevent or eventlet — coroutine-based concurrency within one process
# celery -A app worker --pool=gevent --concurrency=100

# For async Python tasks (requires Celery 5+ with asyncio support):
# solo — single-threaded, uses asyncio (experimental)
# celery -A app worker --pool=solo
```

For ECS, `prefork` is the most reliable choice for production workloads. `gevent` provides more concurrency per worker for pure I/O workloads but has compatibility issues with some libraries that are not gevent-safe. Measure both in staging before committing.

## Go: Goroutines, Pools, and GC Tuning

### http.Server Production Configuration

The Go standard library `net/http` package is production-ready without additional frameworks. Every timeout matters:

```go
package main

import (
	"context"
	"log"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"
)

func main() {
	mux := http.NewServeMux()
	mux.HandleFunc("/health", healthHandler)
	mux.HandleFunc("/api/orders", ordersHandler)

	server := &http.Server{
		Addr:    ":8080",
		Handler: mux,

		// Time allowed to read the full request (headers + body)
		// Prevents Slowloris attacks
		ReadTimeout: 10 * time.Second,

		// Time allowed to read request headers only
		// More granular than ReadTimeout
		ReadHeaderTimeout: 5 * time.Second,

		// Time allowed to write the full response
		// Includes time to send response headers and body
		WriteTimeout: 30 * time.Second,

		// Maximum time an idle keep-alive connection is kept open
		// Set lower than ALB idle timeout (default 60s) to let the server
		// close connections before the ALB does, avoiding 502 errors
		IdleTimeout: 45 * time.Second,

		// Maximum allowed header size in bytes
		MaxHeaderBytes: 1 << 20, // 1 MB
	}

	// Start server in goroutine
	go func() {
		log.Printf("Server starting on %s", server.Addr)
		if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
			log.Fatalf("Server failed: %v", err)
		}
	}()

	// Wait for OS signal
	quit := make(chan os.Signal, 1)
	signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
	<-quit

	// Graceful shutdown: finish in-flight requests
	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	log.Println("Shutting down server...")
	if err := server.Shutdown(ctx); err != nil {
		log.Fatalf("Server forced shutdown: %v", err)
	}

	log.Println("Server stopped")
}

func healthHandler(w http.ResponseWriter, r *http.Request) {
	w.WriteHeader(http.StatusOK)
	w.Write([]byte("ok"))
}

func ordersHandler(w http.ResponseWriter, r *http.Request) {
	// Context from request — cancelled when client disconnects or WriteTimeout expires
	ctx := r.Context()

	// Pass context to all downstream calls
	result, err := fetchOrdersFromDB(ctx)
	if err != nil {
		if ctx.Err() != nil {
			// Client disconnected or timeout — not a server error
			http.Error(w, "Request cancelled", http.StatusRequestTimeout)
			return
		}
		http.Error(w, "Internal server error", http.StatusInternalServerError)
		return
	}

	w.Header().Set("Content-Type", "application/json")
	// encoding/json.NewEncoder(w).Encode(result) is more efficient for large responses
	// (streams directly to ResponseWriter without allocating a full buffer)
	if err := json.NewEncoder(w).Encode(result); err != nil {
		log.Printf("Failed to encode response: %v", err)
	}
}
```

The `IdleTimeout` at 45 seconds versus ALB's default 60-second idle timeout is an important detail. When an ALB closes an idle connection at 60 seconds, any in-flight request on that connection gets a 502 error. By closing the connection at 45 seconds (before the ALB), the server proactively removes the connection and the ALB does not have a stale connection to reuse for the next request.

### sync.Pool for Allocation Reduction

Go's garbage collector is low-latency but not zero-cost. Applications with high allocation rates (allocating new objects for each request) generate GC pressure that increases p99 latency. `sync.Pool` provides a per-goroutine pool of recyclable objects:

```go
import (
	"bytes"
	"sync"
)

var bufferPool = sync.Pool{
	New: func() any {
		return new(bytes.Buffer)
	},
}

func handleRequest(w http.ResponseWriter, r *http.Request) {
	// Get a buffer from the pool (or allocate a new one if pool is empty)
	buf := bufferPool.Get().(*bytes.Buffer)
	buf.Reset() // Clear contents from previous use
	defer bufferPool.Put(buf) // Return to pool when done

	// Use buf for JSON encoding, template rendering, etc.
	if err := json.NewEncoder(buf).Encode(responseData); err != nil {
		http.Error(w, "encoding failed", http.StatusInternalServerError)
		return
	}

	w.Header().Set("Content-Type", "application/json")
	w.Write(buf.Bytes())
}
```

`sync.Pool` objects may be GC'd between uses — the pool is not a cache. Objects returned to the pool must be safe to reuse (always `Reset()` before use). Use it for short-lived allocations that are created and released within a request, not for long-lived state.

### GOGC Tuning for ECS

Go's garbage collector triggers when the heap grows by `GOGC`% since the last collection (default: 100%). This means GC runs when the heap doubles.

For memory-constrained ECS containers, lower GOGC to trigger GC more frequently, keeping heap size smaller at the cost of more CPU spent on GC:

```bash
# In ECS task definition environment variables:
# GOGC=50 triggers GC when heap grows 50% (more frequent, smaller heap)
# GOGC=200 triggers GC when heap doubles again (less frequent, larger heap)
# Default GOGC=100 is appropriate for most workloads

# For containers under memory pressure (close to limit):
GOGC=50

# For containers with abundant memory (throughput over memory):
GOGC=200
```

Go 1.19+ introduced `GOMEMLIMIT` which sets a soft memory limit for the Go runtime. When the heap approaches `GOMEMLIMIT`, GC runs more aggressively to stay under the limit. This is more practical for ECS than tuning `GOGC`:

```bash
# Set GOMEMLIMIT to 80% of container memory limit
# For a 512 MB container:
GOMEMLIMIT=409MiB
```

With `GOMEMLIMIT`, Go's GC automatically tunes its aggressiveness to keep memory under the limit, reducing OOM kills without requiring manual `GOGC` tuning.

### Goroutine Leak Detection

Goroutine leaks — goroutines that are spawned and never exit — accumulate memory over time and eventually crash the container. Common causes:

- Goroutines blocked on channels that are never written to.
- Goroutines blocked on mutex locks that are never released.
- HTTP client goroutines where the response body is never closed.

Detection in staging:

```go
import "runtime"

func goroutineCountHandler(w http.ResponseWriter, r *http.Request) {
	count := runtime.NumGoroutine()
	fmt.Fprintf(w, "goroutines: %d\n", count)
}
```

In production, expose this via a `/debug/vars` endpoint (from `expvar` package) and monitor `goroutine_count` via CloudWatch. A goroutine count that grows monotonically with request count (never decreasing) indicates leaks.

The `goleak` package detects goroutine leaks in tests:

```go
func TestHandler(t *testing.T) {
	defer goleak.VerifyNone(t)

	// Test code here — goleak fails the test if goroutines are leaked
}
```

## File Descriptors on ECS Containers

Every open network connection, file, and socket consumes a file descriptor. The OS default limit is often 1,024 per process. Under high concurrency, you exhaust this limit long before you hit CPU or memory limits.

Symptoms of file descriptor exhaustion:

- "too many open files" errors in application logs.
- New connections refused.
- Database connection pool errors.

Fix at the ECS task definition level:

```json
{
  "containerDefinitions": [
    {
      "name": "api",
      "ulimits": [
        {
          "name": "nofile",
          "softLimit": 65535,
          "hardLimit": 65535
        }
      ]
    }
  ]
}
```

In Terraform:

```hcl
resource "aws_ecs_task_definition" "api" {
  family = "api"

  container_definitions = jsonencode([
    {
      name  = "api"
      image = "${var.ecr_repository_url}:latest"

      ulimits = [
        {
          name      = "nofile"
          softLimit = 65535
          hardLimit = 65535
        }
      ]
    }
  ])
}
```

For Nginx containers, also configure `worker_rlimit_nofile` in `nginx.conf`:

```nginx
worker_rlimit_nofile 65535;

events {
    worker_connections 8192;
}
```

Each active connection requires two file descriptors (client socket + upstream socket for proxy). Effective max connections per worker: `worker_connections / 2 = 4096`. Total for 4 workers: 16,384 concurrent connections from one container.

## Putting Runtime Tuning in Context

The configuration levers covered here have real impact — PHP-FPM worker counts affect throughput linearly, OPcache misses add 10–40ms to every request, event loop blocking turns p50 into p99. But tuning the runtime is the second-order optimization.

The first-order optimizations: eliminate N+1 queries, add caching for hot read paths, use connection pooling (PgBouncer for PostgreSQL, connection pool middleware for MySQL), and right-size ECS tasks based on measured resource utilization rather than guesswork.

A well-tuned runtime on an unoptimized application will plateau. An untuned runtime on a well-optimized application often performs adequately. Build in the right order.

Related reading: [AWS ECS vs EKS: Container Orchestration Decision Guide](/blog/aws-ecs-vs-eks-container-orchestration-decision-guide/) and [AWS Auto Scaling Strategies for EC2, ECS, and Lambda](/blog/aws-auto-scaling-strategies-ec2-ecs-lambda/).

## FAQ

### How does Laravel Octane improve concurrency compared to standard PHP-FPM?
Standard PHP-FPM boots the entire Laravel application framework for every HTTP request: loads all service providers, resolves the container, runs middleware bootstrap. For a typical Laravel app this takes 10–40ms per request before your controller code runs. Laravel Octane keeps the application in memory between requests using Swoole or RoadRunner as the application server. Bootstrap runs once; subsequent requests skip it. This reduces p50 latency for typical Laravel endpoints from 50–100ms to 5–20ms, and increases throughput per worker by 3–10×. The cost implication for ECS: fewer containers needed for the same request volume, or smaller container sizes, directly reducing Fargate costs.

### What causes Node.js event loop blocking and how do you detect it?
The Node.js event loop processes I/O callbacks on a single thread. CPU-intensive operations — JSON.parse on large objects, crypto operations, tight loops, synchronous file reads — block the event loop, preventing it from processing other I/O callbacks. The symptom is elevated p99 latency even when CPU utilization is moderate: the event loop is busy but not with I/O. Detection: enable --prof to generate a V8 profiler output, then use node --prof-process to identify the blocking code. The blocked-at npm package logs event loop lag in real-time. In production, track the event loop lag metric (node:perf_hooks PerformanceObserver with entryType eventLoopUtilization) and alert on sustained lag above 50ms.

### How do Python asyncio and Gunicorn workers affect ECS memory costs?
Gunicorn with sync workers (the default) forks (2 × CPU_count + 1) processes, each loading the full application into memory. For a Django/FastAPI app: 80–150 MB per sync worker × (2 × vCPU + 1). A 2-vCPU Fargate task with 4 GB memory can run ~20 sync workers. Switching to uvicorn workers (--worker-class uvicorn.workers.UvicornWorker) allows async I/O within each worker, handling multiple concurrent requests per worker — but each worker still uses the same base memory. The benefit is fewer workers needed for I/O-bound workloads, reducing the total worker count and therefore the container memory requirement. For compute-bound workloads, sync workers are equivalent or better; async workers win only for I/O-heavy workloads.

### Why do Go applications typically need fewer ECS resources than PHP equivalents?
Go HTTP servers use goroutines — lightweight (~8 KB stack initially, growing as needed) cooperative threads scheduled by the Go runtime, not the OS. A Go API server handling 1,000 concurrent requests uses 1,000 goroutines, approximately 8–80 MB for goroutine stacks plus application memory. A PHP-FPM equivalent handling 1,000 concurrent requests requires 1,000 worker processes, each using 30–80 MB of memory — 30–80 GB total. In practice, PHP-FPM containers are sized for max_workers (20–50 typically), with horizontal scaling handling additional concurrency. Go containers can handle far more concurrency per instance because goroutines are cheap. The ECS cost implication: fewer Go instances at smaller sizes for equivalent throughput.

---

*Source: https://www.factualminds.com/blog/tune-php-node-python-go-high-concurrency/*
