---
title: How to Run Production Laravel, Django, and Node Apps on ECS (2026)
description: A deep technical guide to running PHP, Python, and Node.js applications on Amazon ECS in production — covering Fargate vs EC2, FrankenPHP vs Nginx+FPM, multi-container task patterns, zero-downtime deployments, and observability.
url: https://www.factualminds.com/blog/production-laravel-django-node-on-ecs-2026/
datePublished: 2026-03-29T00:00:00.000Z
dateModified: 2026-06-10T00:00:00.000Z
author: palaniappan-p
category: Serverless & Containers
tags: how-to-guide, ecs, fargate, containers, laravel, django, nodejs, php, python, docker, blue-green-deployment, aws
---

# How to Run Production Laravel, Django, and Node Apps on ECS (2026)

> A deep technical guide to running PHP, Python, and Node.js applications on Amazon ECS in production — covering Fargate vs EC2, FrankenPHP vs Nginx+FPM, multi-container task patterns, zero-downtime deployments, and observability.

Running web applications on Amazon ECS in production is not the same as getting a container to start. The gap between "container runs" and "production-grade" spans deployment strategy, observability, resource sizing, worker process management, and failure recovery. This guide covers the specifics for the three most common application stacks deployed on ECS: Laravel (PHP), Django (Python), and Node.js.

The patterns here apply whether you are migrating from EC2 instances, moving off Elastic Beanstalk, or containerizing for the first time. The goal is a deployment that survives traffic spikes, deploys without downtime, and surfaces enough signal to debug failures at 2 AM.

## ECS on EC2 vs Fargate: Production Web App Comparison

The **ECS launch type** determines how your containers get compute — either on EC2 instances you manage or on Fargate's serverless compute. The decision is not purely about cost; it affects cold start latency, operational overhead, and failure modes.

### Fargate

Fargate provisions isolated compute for each task. You define vCPU and memory at the task level; AWS handles everything below the Linux process boundary. There are no AMIs to maintain, no node scaling policies, no SSH access to debug stuck processes.

**Performance characteristics for web workloads:**

- Task cold start: 20–40 seconds from ECS scheduling to first request served. This is relevant for scale-from-zero scenarios and burst scaling events.
- No noisy neighbor effects at the hypervisor level. Each task gets dedicated, isolated compute.
- Network performance scales with vCPU: 0.5 vCPU tasks get ~2.5 Gbps, 4 vCPU tasks get ~10 Gbps.
- Fargate does not support `host` or `bridge` network modes — tasks run in `awsvpc` mode with a dedicated ENI.

**Fargate pricing (us-east-1, 2026):**

- vCPU: $0.04048 per vCPU-hour
- Memory: $0.004445 per GB-hour
- A 1 vCPU / 2GB task running 24/7 costs ~$35.90/month before data transfer

**Fargate Spot** runs on spare capacity with up to 70% discount. Spot tasks can be interrupted with a 2-minute warning. Use Spot for queue workers, batch jobs, and dev/staging environments — not for web-serving tasks in production where interruption causes 502s.

### EC2 Launch Type

EC2 launch type schedules containers onto EC2 instances you manage. You run an ECS container agent on each instance; the agent registers with the cluster and accepts task placements.

**When EC2 wins over Fargate:**

- **Sustained high-density workloads** — a c7g.2xlarge ($0.2896/hr) can run 8–12 web container tasks that would cost 40–60% more on Fargate
- **GPU requirements** — Fargate does not support GPU task definitions; ECS on EC2 with P4d/G5 instances is the only option
- **Local NVMe instance storage** — ephemeral high-IOPS storage for cache-heavy applications (Redis, SQLite-based caching, tmp file operations)
- **Strict p99 latency requirements** — eliminates the Fargate cold start window entirely

**Operational overhead of EC2 launch type:**

- You manage ECS-optimized AMI updates (monthly cadence)
- Cluster capacity provider must be configured to avoid stranded tasks when instances are undersized
- ECS container instance replacement during AMI updates requires draining, which shifts tasks to remaining instances and can cause brief capacity constraints

For teams without dedicated infrastructure engineers, Fargate eliminates an entire category of toil. For teams running 50+ tasks 24/7, the economics of EC2 Reserved Instances justify the operational investment.

### The Hybrid Pattern

The practical answer for most production deployments is a mix: Fargate for web-serving tasks (consistent latency, no operational overhead), EC2 Spot for queue workers and schedulers (interruption-tolerant, cost-optimized), and EC2 On-Demand for anything requiring local storage or GPUs.

ECS Capacity Providers support this mixed model within a single cluster. You can configure a service to prefer Fargate but overflow to EC2 Spot during traffic spikes using capacity provider strategies.

## FrankenPHP vs Nginx + PHP-FPM

For PHP applications on ECS, the runtime stack choice has a material impact on container sizing, throughput, and memory profiles.

### Nginx + PHP-FPM: The Established Stack

**PHP-FPM** (FastCGI Process Manager) manages a pool of PHP worker processes. Nginx accepts HTTP requests and proxies them to PHP-FPM via Unix socket or TCP. This two-process model is mature, well-understood, and has predictable behavior.

**Memory profile:** Each PHP-FPM worker holds its own copy of opcode cache warm state. A pool of 20 workers on a Laravel application with a warm opcache consumes roughly 20–40MB per worker (400–800MB total), plus Nginx at ~15MB. For ECS sizing, a standard Laravel application running 20 FPM workers needs 512MB–1GB of task memory.

**PHP-FPM pool configuration for ECS containers:**

```ini
[www]
pm = dynamic
pm.max_children = 20
pm.start_servers = 5
pm.min_spare_servers = 3
pm.max_spare_servers = 10
pm.max_requests = 500
pm.process_idle_timeout = 10s
request_terminate_timeout = 60
```

`pm.max_requests = 500` is important in containers — it limits memory leak accumulation by periodically recycling workers. Without it, long-running PHP processes gradually accumulate unreleased memory.

### FrankenPHP: Embedded PHP Server

**FrankenPHP** is a PHP application server written in Go that embeds libphp directly into the Caddy web server process. There is no FastCGI layer; PHP execution happens within the same process as HTTP handling.

In **worker mode**, FrankenPHP boots the Laravel application once and keeps it in memory across requests. This is equivalent to Laravel Octane — the framework bootstrap (service container, service providers, configuration) runs once, not on every request. The result is significantly lower per-request latency and higher throughput.

**Performance comparison (Laravel 11, PHP 8.3, 1 vCPU):**

| Stack                                    | Requests/sec (p50) | p99 latency | Memory (20 concurrent) |
| ---------------------------------------- | ------------------ | ----------- | ---------------------- |
| Nginx + PHP-FPM (pm=dynamic, 20 workers) | 180–220            | 45ms        | 650MB                  |
| FrankenPHP worker mode (10 workers)      | 420–480            | 22ms        | 380MB                  |
| Laravel Octane + Nginx + FPM             | 380–430            | 24ms        | 420MB                  |

FrankenPHP worker mode outperforms equivalent PHP-FPM configurations because:

1. No FastCGI protocol serialization/deserialization overhead
2. Application state held in memory between requests (no re-bootstrapping)
3. Go-based HTTP handling is more efficient than Nginx for connection management under high concurrency

**When FrankenPHP loses:**

- Applications with significant global state mutations between requests (legacy code that modifies `$_SERVER`, uses `register_shutdown_function` in unusual ways, or relies on PHP's per-request memory cleanup to manage resource leaks)
- Applications using extensions not compiled into the FrankenPHP build
- Teams unfamiliar with troubleshooting worker-mode state bleed issues

### Dockerfile for FrankenPHP Production Build

```dockerfile
FROM dunglas/frankenphp:1-php8.3-alpine AS builder

RUN install-php-extensions \
    pcntl \
    pdo_mysql \
    pdo_pgsql \
    redis \
    intl \
    zip \
    opcache

COPY --from=composer:2 /usr/bin/composer /usr/bin/composer

WORKDIR /app
COPY composer.json composer.lock ./
RUN composer install --no-dev --optimize-autoloader --no-scripts

COPY . .
RUN composer run-script post-autoload-dump \
    && php artisan config:cache \
    && php artisan route:cache \
    && php artisan view:cache

FROM dunglas/frankenphp:1-php8.3-alpine

RUN install-php-extensions pcntl pdo_mysql pdo_pgsql redis intl zip opcache

WORKDIR /app
COPY --from=builder /app /app

ENV FRANKENPHP_CONFIG="worker ./public/index.php"
ENV SERVER_NAME=":8080"
ENV APP_ENV=production
ENV OCTANE_SERVER=frankenphp

EXPOSE 8080

CMD ["frankenphp", "run", "--config", "/etc/caddy/Caddyfile"]
```

### Nginx Config for PHP-FPM with Upstream Health Checks

```nginx
upstream php-fpm {
    server 127.0.0.1:9000;
    keepalive 32;
}

server {
    listen 8080;
    root /var/www/html/public;
    index index.php;

    client_max_body_size 64m;
    client_body_timeout 30s;
    fastcgi_read_timeout 60s;

    location / {
        try_files $uri $uri/ /index.php?$query_string;
    }

    location = /health {
        access_log off;
        return 200 "OK\n";
        add_header Content-Type text/plain;
    }

    location ~ \.php$ {
        fastcgi_pass php-fpm;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;

        fastcgi_connect_timeout 5s;
        fastcgi_send_timeout 60s;
        fastcgi_read_timeout 60s;
        fastcgi_buffer_size 16k;
        fastcgi_buffers 4 16k;

        # Pass real IP from ALB
        fastcgi_param HTTP_X_FORWARDED_FOR $http_x_forwarded_for;
        fastcgi_param HTTP_X_FORWARDED_PROTO $http_x_forwarded_proto;
    }

    location ~ /\. {
        deny all;
    }
}
```

## Multi-Container Task Definitions: App + Queue + Scheduler

ECS task definitions support multiple containers sharing the same network namespace and local volumes. This is the right pattern for running a Laravel application alongside its queue workers and schedulers.

### Architecture: Three Containers, One Task Definition

The canonical multi-container ECS task for Laravel runs:

1. **App container** — Serves HTTP traffic via Nginx+FPM or FrankenPHP
2. **Worker container** — Runs `php artisan queue:work` consuming from SQS
3. **Scheduler container** — Runs `php artisan schedule:run` (actually better as a separate scheduled task)

For the scheduler, do not run it as a long-lived container that sleeps in a loop. Run it as an **ECS Scheduled Task** triggered by EventBridge every minute. This way you pay only for the seconds it runs.

### ECS Task Definition JSON: Multi-Container Laravel

```json
{
  "family": "laravel-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::ACCOUNT:role/laravel-app-task-role",
  "containerDefinitions": [
    {
      "name": "app",
      "image": "ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/laravel-app:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp",
          "name": "http"
        }
      ],
      "environment": [
        { "name": "APP_ENV", "value": "production" },
        { "name": "LOG_CHANNEL", "value": "stderr" },
        { "name": "QUEUE_CONNECTION", "value": "sqs" }
      ],
      "secrets": [
        {
          "name": "APP_KEY",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:laravel/app-key"
        },
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:laravel/db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/laravel-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "app"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 15,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 30
      },
      "ulimits": [
        {
          "name": "nofile",
          "softLimit": 65536,
          "hardLimit": 65536
        }
      ]
    },
    {
      "name": "worker",
      "image": "ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/laravel-app:latest",
      "essential": false,
      "command": [
        "php",
        "artisan",
        "queue:work",
        "sqs",
        "--tries=3",
        "--timeout=90",
        "--memory=256",
        "--sleep=3",
        "--queue=default,high"
      ],
      "environment": [
        { "name": "APP_ENV", "value": "production" },
        { "name": "LOG_CHANNEL", "value": "stderr" }
      ],
      "secrets": [
        {
          "name": "APP_KEY",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:laravel/app-key"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/laravel-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "worker"
        }
      }
    }
  ]
}
```

**Critical `essential: false` on the worker.** If the worker crashes, ECS should not stop the app container. The reverse is also true — if the app container is restarted during deployment, the worker should not be interrupted mid-job.

### Scaling Workers Independently

The pattern above co-locates app and worker in the same task. For high-throughput queues, run workers as a **separate ECS service** with its own task definition (worker only, no Nginx). This lets you:

- Scale workers based on SQS queue depth without scaling web containers
- Apply different CPU/memory sizing (workers often need more memory than web containers)
- Use Fargate Spot for workers (interruption-tolerant) while web tasks run on standard Fargate

Application Auto Scaling policy for worker scaling:

```json
{
  "PolicyType": "TargetTrackingScaling",
  "TargetTrackingScalingPolicyConfiguration": {
    "CustomizedMetricSpecification": {
      "MetricName": "ApproximateNumberOfMessagesVisible",
      "Namespace": "AWS/SQS",
      "Dimensions": [{ "Name": "QueueName", "Value": "laravel-jobs.fifo" }],
      "Statistic": "Average"
    },
    "TargetValue": 10,
    "ScaleInCooldown": 120,
    "ScaleOutCooldown": 30
  }
}
```

This maintains an average of 10 messages per worker task. At 100 messages visible, Auto Scaling targets 10 worker tasks.

### Django and Node.js Patterns

**Django on ECS** follows the same multi-container pattern but uses Gunicorn (synchronous WSGI) or Uvicorn (async ASGI) as the application server, with Nginx as the reverse proxy.

For Django + Celery (equivalent to Laravel queues):

```dockerfile
# Django worker container command
CMD ["celery", "-A", "myapp", "worker",
     "--loglevel=info",
     "--concurrency=4",
     "--queues=default,high_priority",
     "--max-tasks-per-child=200"]
```

`--max-tasks-per-child=200` is the Celery equivalent of PHP-FPM's `pm.max_requests` — it recycles worker processes to prevent memory accumulation.

**Node.js (NestJS) on ECS** typically runs as a single process using PM2 or directly as `node dist/main.js`. In a container, running PM2 cluster mode adds complexity without benefit — use multiple ECS tasks for horizontal scaling instead of multiple processes per container. One process per container is the container-native pattern.

```dockerfile
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build

FROM node:22-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/main.js"]
```

## Zero-Downtime Deployments

### Rolling Updates

ECS rolling deployments replace tasks incrementally. The `minimumHealthyPercent` and `maximumPercent` deployment configuration values control the rollout behavior:

```json
{
  "deploymentConfiguration": {
    "minimumHealthyPercent": 100,
    "maximumPercent": 200,
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }
}
```

`minimumHealthyPercent: 100` with `maximumPercent: 200` means ECS launches new tasks before terminating old ones — a true blue/green within a single service. With `deploymentCircuitBreaker` enabled, ECS automatically rolls back if a configurable number of consecutive tasks fail their health checks during deployment.

The **deregistrationDelay** on your ALB target group must be set appropriately. The default is 300 seconds. During a deployment, ECS deregisters old tasks from the target group and waits for `deregistrationDelay` before stopping the container. If your application has 30-second connections (long polling, Server-Sent Events), set deregistrationDelay to 60–90 seconds. For standard REST APIs, 30 seconds is sufficient.

### Blue/Green with CodeDeploy

**Blue/green deployments** via CodeDeploy introduce a second target group. The deployment creates a new task set (green), registers it with the secondary target group, then shifts traffic from the primary (blue) target group to green.

CodeDeploy appspec.yml for ECS blue/green:

```yaml
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: '<TASK_DEFINITION>'
        LoadBalancerInfo:
          ContainerName: 'app'
          ContainerPort: 8080
        PlatformVersion: 'LATEST'
Hooks:
  - BeforeAllowTraffic: 'ValidateGreenEnvironment'
  - AfterAllowTraffic: 'ValidateProductionTraffic'
```

The `BeforeAllowTraffic` hook runs a Lambda function against the green target group before any production traffic is shifted. This is where you run smoke tests, check application health endpoints, and validate database migrations completed successfully.

**Canary traffic shifting** shifts a small percentage (10%) of traffic to green first, holds for a configurable period, then completes the shift. This requires your application to handle both old and new task versions serving traffic simultaneously — a constraint that affects how you write database migrations (additive only, no column renames or type changes until the old version is fully retired).

### Go applications

Go services on ECS behave better during deployments than interpreted language runtimes because startup time is near-instant. A compiled Go binary starts serving requests in under 100ms. This makes canary deployments easier — there is no warm-up period, and the green environment is ready for real traffic almost immediately after task start.

## Observability: Structured Logging, X-Ray, and Custom Metrics

### Structured Logging to CloudWatch

All containers should log to `stderr` in JSON format. The ECS `awslogs` log driver captures stdout and stderr and sends them to CloudWatch Logs.

**Laravel structured logging** via Monolog JSON formatter:

```php
// config/logging.php
'channels' => [
    'stderr' => [
        'driver' => 'monolog',
        'handler' => StreamHandler::class,
        'with' => [
            'stream' => 'php://stderr',
        ],
        'formatter' => JsonFormatter::class,
        'level' => 'debug',
    ],
],
```

Set `LOG_CHANNEL=stderr` in your ECS task environment variables. Never write logs to a file in a container — the file is lost when the task stops, and volume mounts for logging add operational complexity without benefit.

**Django structured logging** using `python-json-logger`:

```python
# settings.py
LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'json': {
            '()': 'pythonjsonlogger.jsonlogger.JsonFormatter',
            'format': '%(asctime)s %(levelname)s %(name)s %(message)s',
        },
    },
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'formatter': 'json',
            'stream': 'ext://sys.stderr',
        },
    },
    'root': {
        'handlers': ['console'],
        'level': 'INFO',
    },
}
```

### CloudWatch Log Group Terraform

```hcl
resource "aws_cloudwatch_log_group" "app" {
  name              = "/ecs/laravel-app"
  retention_in_days = 30

  tags = {
    Environment = var.environment
    Application = "laravel-app"
  }
}

resource "aws_cloudwatch_log_group" "worker" {
  name              = "/ecs/laravel-worker"
  retention_in_days = 14

  tags = {
    Environment = var.environment
    Application = "laravel-worker"
  }
}

resource "aws_cloudwatch_metric_alarm" "ecs_task_failures" {
  alarm_name          = "laravel-app-task-stopped"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "TaskStopReason"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Sum"
  threshold           = 2
  alarm_description   = "ECS tasks are stopping unexpectedly"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ClusterName = aws_ecs_cluster.main.name
    ServiceName = aws_ecs_service.app.name
  }
}
```

### X-Ray Tracing

Run the **AWS X-Ray daemon as a sidecar container** in your task definition. The daemon collects traces from your application SDK and forwards them to the X-Ray service.

Add to task definition `containerDefinitions`:

```json
{
  "name": "xray-daemon",
  "image": "amazon/aws-xray-daemon:3.x",
  "essential": false,
  "portMappings": [
    {
      "containerPort": 2000,
      "protocol": "udp"
    }
  ],
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": "/ecs/laravel-app",
      "awslogs-region": "us-east-1",
      "awslogs-stream-prefix": "xray"
    }
  },
  "cpu": 32,
  "memory": 64
}
```

For Laravel, the `aws/aws-sdk-php` SDK includes X-Ray tracing via the `beansed/laravel-xray` package. For Node.js, use `aws-xray-sdk-node`. For Django/Python, use `aws-xray-sdk-python`.

### Custom Metrics Per Framework

**Custom CloudWatch metric for queue depth per worker:**

```bash
# Run as a sidecar or cron — reports queue processing rate to CloudWatch
aws cloudwatch put-metric-data \
  --namespace "Application/Laravel" \
  --metric-name "QueueJobsProcessed" \
  --value "$JOBS_PROCESSED" \
  --dimensions Environment=production,Queue=default \
  --unit Count
```

For PHP applications, emit metrics via the AWS SDK within your `queue:work` process:

```php
// app/Queue/Listeners/JobProcessed.php
public function handle(JobProcessed $event): void
{
    $this->cloudwatch->putMetricData([
        'Namespace' => 'Application/Laravel',
        'MetricData' => [
            [
                'MetricName' => 'JobProcessingTime',
                'Value' => $event->job->getJobId() ? microtime(true) - $this->startTime : 0,
                'Unit' => 'Milliseconds',
                'Dimensions' => [
                    ['Name' => 'Queue', 'Value' => $event->job->getQueue()],
                    ['Name' => 'Environment', 'Value' => config('app.env')],
                ],
            ],
        ],
    ]);
}
```

## Memory and CPU Sizing

### PHP (Laravel Octane / FrankenPHP)

| Scenario                      | vCPU | Memory | Max Concurrent Requests |
| ----------------------------- | ---- | ------ | ----------------------- |
| Low traffic (<50 req/s)       | 0.5  | 1GB    | ~15                     |
| Medium traffic (50–200 req/s) | 1    | 2GB    | ~30                     |
| High traffic (200–500 req/s)  | 2    | 4GB    | ~60                     |
| Laravel Octane (Swoole)       | 1    | 2GB    | ~100+ (async)           |

PHP memory leaks in Octane/worker mode are cumulative. Monitor the `php_memory_usage_bytes` metric if you expose it via `/metrics`. Set `--memory=256` on queue workers so Laravel itself terminates a worker process that exceeds 256MB, preventing unbounded growth.

### Python (Gunicorn / Uvicorn)

Gunicorn synchronous workers: `workers = (2 * CPU) + 1`. On a 1 vCPU Fargate task, run 3 Gunicorn workers. Each Django worker holds database connections — size your RDS connection pool accordingly. With 10 tasks × 3 workers = 30 connections minimum.

For async workloads (FastAPI, Django with async views), Uvicorn with 1–2 workers handles far more concurrency than synchronous Gunicorn at equivalent CPU. Uvicorn's event loop handles I/O-bound work efficiently.

### Node.js

Node.js runs single-threaded by default. One Node process per ECS task is correct for containers. For CPU-bound workloads, use worker threads (`worker_threads` module) or run multiple tasks. Node.js is memory-efficient — a typical NestJS API uses 100–200MB per process at runtime. 512MB tasks are sufficient for most APIs.

### Go

Go binaries are memory-light. A typical Go API server uses 30–80MB at runtime. 256MB tasks are viable. Go scales goroutines efficiently with a single process — unlike Node, it handles CPU parallelism natively with GOMAXPROCS set to container vCPU count.

## Edge Cases and Failure Scenarios

### Container Restart Loops

ECS marks a task unhealthy and stops it if health checks fail repeatedly. The most common causes of restart loops:

1. **Missing environment variables** — Application fails to boot because a required secret was not injected. Fix: validate all secrets exist in Secrets Manager before deployment; add startup validation in your application bootstrap.

2. **Database migration failures during startup** — Running `php artisan migrate` in the Docker ENTRYPOINT causes restart loops if the migration fails (table lock, bad SQL). Fix: run migrations as a pre-deployment ECS Task via CodePipeline, not in the container startup.

3. **Health check endpoint misconfiguration** — The `/health` route requires database connectivity, which fails during cold start. Fix: make your health check endpoint return 200 immediately without downstream dependencies; use a separate `/ready` endpoint for readiness checks.

### Memory Leak Detection

For PHP: watch the `MemoryUtilization` metric in CloudWatch. A gradual upward drift over 24 hours without corresponding traffic increase indicates a leak. Enable `pm.max_requests` in FPM and `--memory` limits on queue workers as circuit breakers.

For Node.js: use `--max-old-space-size` to set a hard limit matching your task memory. If Node hits the limit, it crashes with an OOM error. This is preferable to a slow memory leak causing task degradation — the container restarts and gets a clean state.

### Zombie Worker Processes

PHP-FPM can accumulate zombie child processes if the master process crashes without cleaning up. In containers, PID 1 is your init process — if Nginx or FPM starts as PID 1 (rather than using an init wrapper), orphaned processes may not be reaped.

Use `ENTRYPOINT ["tini", "--"]` in your Dockerfile to run tini as PID 1. Tini handles zombie reaping and forwards signals correctly to child processes. For FrankenPHP, this is handled internally — the Go runtime manages child processes correctly.

For more on ECS container orchestration decisions, see our [ECS vs EKS decision guide](/blog/aws-ecs-vs-eks-container-orchestration-decision-guide/).

For auto-scaling your ECS services under variable load, see [AWS Auto Scaling Strategies for EC2, ECS, and Lambda](/blog/aws-auto-scaling-strategies-ec2-ecs-lambda/).

For CloudWatch observability configuration across your ECS cluster, see [CloudWatch Observability: Metrics, Logs, and Alarms Best Practices](/blog/aws-cloudwatch-observability-metrics-logs-alarms-best-practices/).

## Related reading

- [AWS Lambda S3 Files: POSIX Mount for S3, ~13× Cheaper Than EFS — and the 6 Limits to Know](/blog/aws-lambda-s3-files-vs-efs-cost-and-limits/)
- [Scaling EdTech Platforms on AWS: Serverless Architecture for Education](/blog/scaling-edtech-platforms-on-aws-serverless-architecture/)

## FAQ

### Should you use ECS Fargate or EC2 launch type for PHP/Python/Node production apps?
For most PHP, Python, and Node.js web applications, Fargate is the right default. It eliminates AMI management, node patching, and cluster capacity planning. The tradeoff is a cold start latency of 20–40 seconds when scaling from zero and slightly higher per-vCPU cost compared to EC2. Use EC2 launch type when you need GPU instances, local NVMe instance storage for ephemeral data, sustained high-density workloads where Reserved Instances deliver material savings, or where your p99 latency requirements cannot tolerate Fargate cold starts. For batch processing or queue workers where cold start latency does not matter, Fargate Spot is often the cheapest option available.

### What is FrankenPHP and is it production-ready in 2026?
FrankenPHP is a PHP application server built on top of the Caddy web server, written in Go. It embeds PHP directly into the server process, eliminating the FastCGI protocol overhead between Nginx and PHP-FPM. In 2026, FrankenPHP is production-ready for Laravel and Symfony applications. It supports Laravel Octane natively, provides worker mode for persistent application state, and delivers 15–40% higher throughput than equivalent Nginx+FPM configurations in benchmarks. The main operational consideration is that FrankenPHP uses more memory per process than PHP-FPM because the application bootstrap is held in memory across requests. Size your ECS task definitions to at least 256MB above your PHP-FPM equivalent baseline.

### How do you run Laravel queue workers and schedulers on ECS?
Laravel queue workers run as a second container within the same ECS task definition, sharing the same environment variables and IAM role. The worker container runs `php artisan queue:work --tries=3 --timeout=90` as its command. For the Laravel scheduler, use a separate ECS Scheduled Task (an EventBridge rule that triggers the ECS task on a cron schedule) running `php artisan schedule:run`. Do not run the scheduler as a long-lived worker that sleeps between runs — use an EventBridge-triggered task so you pay only for the seconds the scheduler actually executes. For high-throughput queues, scale the worker container independently using Application Auto Scaling on the SQS queue depth metric.

### How do you achieve zero-downtime deployments on ECS?
ECS supports two zero-downtime deployment strategies. Rolling updates replace tasks incrementally using minimumHealthyPercent and maximumPercent settings — the simplest option but with no instant rollback. Blue/green deployments via AWS CodeDeploy deploy a new task set to a separate target group, shift traffic gradually (canary or linear), and enable instant rollback by re-weighting the ALB target groups. For production applications, blue/green is recommended because CodeDeploy hooks let you run integration tests against the green environment before traffic shifts, and rollback is sub-second. The deregistrationDelay on your target group (default 300 seconds) must be set to match your application drain time to prevent 502s during deployment.

---

*Source: https://www.factualminds.com/blog/production-laravel-django-node-on-ecs-2026/*
