AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

A deep technical guide to running PHP, Python, and Node.js applications on Amazon ECS in production — covering Fargate vs EC2, FrankenPHP vs Nginx+FPM, multi-container task patterns, zero-downtime deployments, and observability.

Key Facts

  • js applications on Amazon ECS in production — covering Fargate vs EC2, FrankenPHP vs Nginx+FPM, multi-container task patterns, zero-downtime deployments, and observability
  • js applications on Amazon ECS in production — covering Fargate vs EC2, FrankenPHP vs Nginx+FPM, multi-container task patterns, zero-downtime deployments, and observability

Entity Definitions

EC2
EC2 is an AWS service discussed in this article.
ECS
ECS is an AWS service discussed in this article.
Amazon ECS
Amazon ECS is an AWS service discussed in this article.

How to Run Production Laravel, Django, and Node Apps on ECS (2026)

Quick summary: A deep technical guide to running PHP, Python, and Node.js applications on Amazon ECS in production — covering Fargate vs EC2, FrankenPHP vs Nginx+FPM, multi-container task patterns, zero-downtime deployments, and observability.

Key Takeaways

  • js applications on Amazon ECS in production — covering Fargate vs EC2, FrankenPHP vs Nginx+FPM, multi-container task patterns, zero-downtime deployments, and observability
  • js applications on Amazon ECS in production — covering Fargate vs EC2, FrankenPHP vs Nginx+FPM, multi-container task patterns, zero-downtime deployments, and observability
How to Run Production Laravel, Django, and Node Apps on ECS (2026)
Table of Contents

Running web applications on Amazon ECS in production is not the same as getting a container to start. The gap between “container runs” and “production-grade” spans deployment strategy, observability, resource sizing, worker process management, and failure recovery. This guide covers the specifics for the three most common application stacks deployed on ECS: Laravel (PHP), Django (Python), and Node.js.

The patterns here apply whether you are migrating from EC2 instances, moving off Elastic Beanstalk, or containerizing for the first time. The goal is a deployment that survives traffic spikes, deploys without downtime, and surfaces enough signal to debug failures at 2 AM.

ECS on EC2 vs Fargate: Production Web App Comparison

The ECS launch type determines how your containers get compute — either on EC2 instances you manage or on Fargate’s serverless compute. The decision is not purely about cost; it affects cold start latency, operational overhead, and failure modes.

Fargate

Fargate provisions isolated compute for each task. You define vCPU and memory at the task level; AWS handles everything below the Linux process boundary. There are no AMIs to maintain, no node scaling policies, no SSH access to debug stuck processes.

Performance characteristics for web workloads:

  • Task cold start: 20–40 seconds from ECS scheduling to first request served. This is relevant for scale-from-zero scenarios and burst scaling events.
  • No noisy neighbor effects at the hypervisor level. Each task gets dedicated, isolated compute.
  • Network performance scales with vCPU: 0.5 vCPU tasks get ~2.5 Gbps, 4 vCPU tasks get ~10 Gbps.
  • Fargate does not support host or bridge network modes — tasks run in awsvpc mode with a dedicated ENI.

Fargate pricing (us-east-1, 2026):

  • vCPU: $0.04048 per vCPU-hour
  • Memory: $0.004445 per GB-hour
  • A 1 vCPU / 2GB task running 24/7 costs ~$35.90/month before data transfer

Fargate Spot runs on spare capacity with up to 70% discount. Spot tasks can be interrupted with a 2-minute warning. Use Spot for queue workers, batch jobs, and dev/staging environments — not for web-serving tasks in production where interruption causes 502s.

EC2 Launch Type

EC2 launch type schedules containers onto EC2 instances you manage. You run an ECS container agent on each instance; the agent registers with the cluster and accepts task placements.

When EC2 wins over Fargate:

  • Sustained high-density workloads — a c7g.2xlarge ($0.2896/hr) can run 8–12 web container tasks that would cost 40–60% more on Fargate
  • GPU requirements — Fargate does not support GPU task definitions; ECS on EC2 with P4d/G5 instances is the only option
  • Local NVMe instance storage — ephemeral high-IOPS storage for cache-heavy applications (Redis, SQLite-based caching, tmp file operations)
  • Strict p99 latency requirements — eliminates the Fargate cold start window entirely

Operational overhead of EC2 launch type:

  • You manage ECS-optimized AMI updates (monthly cadence)
  • Cluster capacity provider must be configured to avoid stranded tasks when instances are undersized
  • ECS container instance replacement during AMI updates requires draining, which shifts tasks to remaining instances and can cause brief capacity constraints

For teams without dedicated infrastructure engineers, Fargate eliminates an entire category of toil. For teams running 50+ tasks 24/7, the economics of EC2 Reserved Instances justify the operational investment.

The Hybrid Pattern

The practical answer for most production deployments is a mix: Fargate for web-serving tasks (consistent latency, no operational overhead), EC2 Spot for queue workers and schedulers (interruption-tolerant, cost-optimized), and EC2 On-Demand for anything requiring local storage or GPUs.

ECS Capacity Providers support this mixed model within a single cluster. You can configure a service to prefer Fargate but overflow to EC2 Spot during traffic spikes using capacity provider strategies.

FrankenPHP vs Nginx + PHP-FPM

For PHP applications on ECS, the runtime stack choice has a material impact on container sizing, throughput, and memory profiles.

Nginx + PHP-FPM: The Established Stack

PHP-FPM (FastCGI Process Manager) manages a pool of PHP worker processes. Nginx accepts HTTP requests and proxies them to PHP-FPM via Unix socket or TCP. This two-process model is mature, well-understood, and has predictable behavior.

Memory profile: Each PHP-FPM worker holds its own copy of opcode cache warm state. A pool of 20 workers on a Laravel application with a warm opcache consumes roughly 20–40MB per worker (400–800MB total), plus Nginx at ~15MB. For ECS sizing, a standard Laravel application running 20 FPM workers needs 512MB–1GB of task memory.

PHP-FPM pool configuration for ECS containers:

[www]
pm = dynamic
pm.max_children = 20
pm.start_servers = 5
pm.min_spare_servers = 3
pm.max_spare_servers = 10
pm.max_requests = 500
pm.process_idle_timeout = 10s
request_terminate_timeout = 60

pm.max_requests = 500 is important in containers — it limits memory leak accumulation by periodically recycling workers. Without it, long-running PHP processes gradually accumulate unreleased memory.

FrankenPHP: Embedded PHP Server

FrankenPHP is a PHP application server written in Go that embeds libphp directly into the Caddy web server process. There is no FastCGI layer; PHP execution happens within the same process as HTTP handling.

In worker mode, FrankenPHP boots the Laravel application once and keeps it in memory across requests. This is equivalent to Laravel Octane — the framework bootstrap (service container, service providers, configuration) runs once, not on every request. The result is significantly lower per-request latency and higher throughput.

Performance comparison (Laravel 11, PHP 8.3, 1 vCPU):

StackRequests/sec (p50)p99 latencyMemory (20 concurrent)
Nginx + PHP-FPM (pm=dynamic, 20 workers)180–22045ms650MB
FrankenPHP worker mode (10 workers)420–48022ms380MB
Laravel Octane + Nginx + FPM380–43024ms420MB

FrankenPHP worker mode outperforms equivalent PHP-FPM configurations because:

  1. No FastCGI protocol serialization/deserialization overhead
  2. Application state held in memory between requests (no re-bootstrapping)
  3. Go-based HTTP handling is more efficient than Nginx for connection management under high concurrency

When FrankenPHP loses:

  • Applications with significant global state mutations between requests (legacy code that modifies $_SERVER, uses register_shutdown_function in unusual ways, or relies on PHP’s per-request memory cleanup to manage resource leaks)
  • Applications using extensions not compiled into the FrankenPHP build
  • Teams unfamiliar with troubleshooting worker-mode state bleed issues

Dockerfile for FrankenPHP Production Build

FROM dunglas/frankenphp:1-php8.3-alpine AS builder

RUN install-php-extensions \
    pcntl \
    pdo_mysql \
    pdo_pgsql \
    redis \
    intl \
    zip \
    opcache

COPY --from=composer:2 /usr/bin/composer /usr/bin/composer

WORKDIR /app
COPY composer.json composer.lock ./
RUN composer install --no-dev --optimize-autoloader --no-scripts

COPY . .
RUN composer run-script post-autoload-dump \
    && php artisan config:cache \
    && php artisan route:cache \
    && php artisan view:cache

FROM dunglas/frankenphp:1-php8.3-alpine

RUN install-php-extensions pcntl pdo_mysql pdo_pgsql redis intl zip opcache

WORKDIR /app
COPY --from=builder /app /app

ENV FRANKENPHP_CONFIG="worker ./public/index.php"
ENV SERVER_NAME=":8080"
ENV APP_ENV=production
ENV OCTANE_SERVER=frankenphp

EXPOSE 8080

CMD ["frankenphp", "run", "--config", "/etc/caddy/Caddyfile"]

Nginx Config for PHP-FPM with Upstream Health Checks

upstream php-fpm {
    server 127.0.0.1:9000;
    keepalive 32;
}

server {
    listen 8080;
    root /var/www/html/public;
    index index.php;

    client_max_body_size 64m;
    client_body_timeout 30s;
    fastcgi_read_timeout 60s;

    location / {
        try_files $uri $uri/ /index.php?$query_string;
    }

    location = /health {
        access_log off;
        return 200 "OK\n";
        add_header Content-Type text/plain;
    }

    location ~ \.php$ {
        fastcgi_pass php-fpm;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;

        fastcgi_connect_timeout 5s;
        fastcgi_send_timeout 60s;
        fastcgi_read_timeout 60s;
        fastcgi_buffer_size 16k;
        fastcgi_buffers 4 16k;

        # Pass real IP from ALB
        fastcgi_param HTTP_X_FORWARDED_FOR $http_x_forwarded_for;
        fastcgi_param HTTP_X_FORWARDED_PROTO $http_x_forwarded_proto;
    }

    location ~ /\. {
        deny all;
    }
}

Multi-Container Task Definitions: App + Queue + Scheduler

ECS task definitions support multiple containers sharing the same network namespace and local volumes. This is the right pattern for running a Laravel application alongside its queue workers and schedulers.

Architecture: Three Containers, One Task Definition

The canonical multi-container ECS task for Laravel runs:

  1. App container — Serves HTTP traffic via Nginx+FPM or FrankenPHP
  2. Worker container — Runs php artisan queue:work consuming from SQS
  3. Scheduler container — Runs php artisan schedule:run (actually better as a separate scheduled task)

For the scheduler, do not run it as a long-lived container that sleeps in a loop. Run it as an ECS Scheduled Task triggered by EventBridge every minute. This way you pay only for the seconds it runs.

ECS Task Definition JSON: Multi-Container Laravel

{
  "family": "laravel-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::ACCOUNT:role/laravel-app-task-role",
  "containerDefinitions": [
    {
      "name": "app",
      "image": "ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/laravel-app:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp",
          "name": "http"
        }
      ],
      "environment": [
        { "name": "APP_ENV", "value": "production" },
        { "name": "LOG_CHANNEL", "value": "stderr" },
        { "name": "QUEUE_CONNECTION", "value": "sqs" }
      ],
      "secrets": [
        {
          "name": "APP_KEY",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:laravel/app-key"
        },
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:laravel/db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/laravel-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "app"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 15,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 30
      },
      "ulimits": [
        {
          "name": "nofile",
          "softLimit": 65536,
          "hardLimit": 65536
        }
      ]
    },
    {
      "name": "worker",
      "image": "ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/laravel-app:latest",
      "essential": false,
      "command": [
        "php", "artisan", "queue:work", "sqs",
        "--tries=3",
        "--timeout=90",
        "--memory=256",
        "--sleep=3",
        "--queue=default,high"
      ],
      "environment": [
        { "name": "APP_ENV", "value": "production" },
        { "name": "LOG_CHANNEL", "value": "stderr" }
      ],
      "secrets": [
        {
          "name": "APP_KEY",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:laravel/app-key"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/laravel-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "worker"
        }
      }
    }
  ]
}

Critical essential: false on the worker. If the worker crashes, ECS should not stop the app container. The reverse is also true — if the app container is restarted during deployment, the worker should not be interrupted mid-job.

Scaling Workers Independently

The pattern above co-locates app and worker in the same task. For high-throughput queues, run workers as a separate ECS service with its own task definition (worker only, no Nginx). This lets you:

  • Scale workers based on SQS queue depth without scaling web containers
  • Apply different CPU/memory sizing (workers often need more memory than web containers)
  • Use Fargate Spot for workers (interruption-tolerant) while web tasks run on standard Fargate

Application Auto Scaling policy for worker scaling:

{
  "PolicyType": "TargetTrackingScaling",
  "TargetTrackingScalingPolicyConfiguration": {
    "CustomizedMetricSpecification": {
      "MetricName": "ApproximateNumberOfMessagesVisible",
      "Namespace": "AWS/SQS",
      "Dimensions": [
        { "Name": "QueueName", "Value": "laravel-jobs.fifo" }
      ],
      "Statistic": "Average"
    },
    "TargetValue": 10,
    "ScaleInCooldown": 120,
    "ScaleOutCooldown": 30
  }
}

This maintains an average of 10 messages per worker task. At 100 messages visible, Auto Scaling targets 10 worker tasks.

Django and Node.js Patterns

Django on ECS follows the same multi-container pattern but uses Gunicorn (synchronous WSGI) or Uvicorn (async ASGI) as the application server, with Nginx as the reverse proxy.

For Django + Celery (equivalent to Laravel queues):

# Django worker container command
CMD ["celery", "-A", "myapp", "worker",
     "--loglevel=info",
     "--concurrency=4",
     "--queues=default,high_priority",
     "--max-tasks-per-child=200"]

--max-tasks-per-child=200 is the Celery equivalent of PHP-FPM’s pm.max_requests — it recycles worker processes to prevent memory accumulation.

Node.js (NestJS) on ECS typically runs as a single process using PM2 or directly as node dist/main.js. In a container, running PM2 cluster mode adds complexity without benefit — use multiple ECS tasks for horizontal scaling instead of multiple processes per container. One process per container is the container-native pattern.

FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build

FROM node:22-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/main.js"]

Zero-Downtime Deployments

Rolling Updates

ECS rolling deployments replace tasks incrementally. The minimumHealthyPercent and maximumPercent deployment configuration values control the rollout behavior:

{
  "deploymentConfiguration": {
    "minimumHealthyPercent": 100,
    "maximumPercent": 200,
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }
}

minimumHealthyPercent: 100 with maximumPercent: 200 means ECS launches new tasks before terminating old ones — a true blue/green within a single service. With deploymentCircuitBreaker enabled, ECS automatically rolls back if a configurable number of consecutive tasks fail their health checks during deployment.

The deregistrationDelay on your ALB target group must be set appropriately. The default is 300 seconds. During a deployment, ECS deregisters old tasks from the target group and waits for deregistrationDelay before stopping the container. If your application has 30-second connections (long polling, Server-Sent Events), set deregistrationDelay to 60–90 seconds. For standard REST APIs, 30 seconds is sufficient.

Blue/Green with CodeDeploy

Blue/green deployments via CodeDeploy introduce a second target group. The deployment creates a new task set (green), registers it with the secondary target group, then shifts traffic from the primary (blue) target group to green.

CodeDeploy appspec.yml for ECS blue/green:

version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "<TASK_DEFINITION>"
        LoadBalancerInfo:
          ContainerName: "app"
          ContainerPort: 8080
        PlatformVersion: "LATEST"
Hooks:
  - BeforeAllowTraffic: "ValidateGreenEnvironment"
  - AfterAllowTraffic: "ValidateProductionTraffic"

The BeforeAllowTraffic hook runs a Lambda function against the green target group before any production traffic is shifted. This is where you run smoke tests, check application health endpoints, and validate database migrations completed successfully.

Canary traffic shifting shifts a small percentage (10%) of traffic to green first, holds for a configurable period, then completes the shift. This requires your application to handle both old and new task versions serving traffic simultaneously — a constraint that affects how you write database migrations (additive only, no column renames or type changes until the old version is fully retired).

Go applications

Go services on ECS behave better during deployments than interpreted language runtimes because startup time is near-instant. A compiled Go binary starts serving requests in under 100ms. This makes canary deployments easier — there is no warm-up period, and the green environment is ready for real traffic almost immediately after task start.

Observability: Structured Logging, X-Ray, and Custom Metrics

Structured Logging to CloudWatch

All containers should log to stderr in JSON format. The ECS awslogs log driver captures stdout and stderr and sends them to CloudWatch Logs.

Laravel structured logging via Monolog JSON formatter:

// config/logging.php
'channels' => [
    'stderr' => [
        'driver' => 'monolog',
        'handler' => StreamHandler::class,
        'with' => [
            'stream' => 'php://stderr',
        ],
        'formatter' => JsonFormatter::class,
        'level' => 'debug',
    ],
],

Set LOG_CHANNEL=stderr in your ECS task environment variables. Never write logs to a file in a container — the file is lost when the task stops, and volume mounts for logging add operational complexity without benefit.

Django structured logging using python-json-logger:

# settings.py
LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'json': {
            '()': 'pythonjsonlogger.jsonlogger.JsonFormatter',
            'format': '%(asctime)s %(levelname)s %(name)s %(message)s',
        },
    },
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'formatter': 'json',
            'stream': 'ext://sys.stderr',
        },
    },
    'root': {
        'handlers': ['console'],
        'level': 'INFO',
    },
}

CloudWatch Log Group Terraform

resource "aws_cloudwatch_log_group" "app" {
  name              = "/ecs/laravel-app"
  retention_in_days = 30

  tags = {
    Environment = var.environment
    Application = "laravel-app"
  }
}

resource "aws_cloudwatch_log_group" "worker" {
  name              = "/ecs/laravel-worker"
  retention_in_days = 14

  tags = {
    Environment = var.environment
    Application = "laravel-worker"
  }
}

resource "aws_cloudwatch_metric_alarm" "ecs_task_failures" {
  alarm_name          = "laravel-app-task-stopped"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "TaskStopReason"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Sum"
  threshold           = 2
  alarm_description   = "ECS tasks are stopping unexpectedly"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ClusterName = aws_ecs_cluster.main.name
    ServiceName = aws_ecs_service.app.name
  }
}

X-Ray Tracing

Run the AWS X-Ray daemon as a sidecar container in your task definition. The daemon collects traces from your application SDK and forwards them to the X-Ray service.

Add to task definition containerDefinitions:

{
  "name": "xray-daemon",
  "image": "amazon/aws-xray-daemon:3.x",
  "essential": false,
  "portMappings": [
    {
      "containerPort": 2000,
      "protocol": "udp"
    }
  ],
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": "/ecs/laravel-app",
      "awslogs-region": "us-east-1",
      "awslogs-stream-prefix": "xray"
    }
  },
  "cpu": 32,
  "memory": 64
}

For Laravel, the aws/aws-sdk-php SDK includes X-Ray tracing via the beansed/laravel-xray package. For Node.js, use aws-xray-sdk-node. For Django/Python, use aws-xray-sdk-python.

Custom Metrics Per Framework

Custom CloudWatch metric for queue depth per worker:

# Run as a sidecar or cron — reports queue processing rate to CloudWatch
aws cloudwatch put-metric-data \
  --namespace "Application/Laravel" \
  --metric-name "QueueJobsProcessed" \
  --value "$JOBS_PROCESSED" \
  --dimensions Environment=production,Queue=default \
  --unit Count

For PHP applications, emit metrics via the AWS SDK within your queue:work process:

// app/Queue/Listeners/JobProcessed.php
public function handle(JobProcessed $event): void
{
    $this->cloudwatch->putMetricData([
        'Namespace' => 'Application/Laravel',
        'MetricData' => [
            [
                'MetricName' => 'JobProcessingTime',
                'Value' => $event->job->getJobId() ? microtime(true) - $this->startTime : 0,
                'Unit' => 'Milliseconds',
                'Dimensions' => [
                    ['Name' => 'Queue', 'Value' => $event->job->getQueue()],
                    ['Name' => 'Environment', 'Value' => config('app.env')],
                ],
            ],
        ],
    ]);
}

Memory and CPU Sizing

PHP (Laravel Octane / FrankenPHP)

ScenariovCPUMemoryMax Concurrent Requests
Low traffic (<50 req/s)0.51GB~15
Medium traffic (50–200 req/s)12GB~30
High traffic (200–500 req/s)24GB~60
Laravel Octane (Swoole)12GB~100+ (async)

PHP memory leaks in Octane/worker mode are cumulative. Monitor the php_memory_usage_bytes metric if you expose it via /metrics. Set --memory=256 on queue workers so Laravel itself terminates a worker process that exceeds 256MB, preventing unbounded growth.

Python (Gunicorn / Uvicorn)

Gunicorn synchronous workers: workers = (2 * CPU) + 1. On a 1 vCPU Fargate task, run 3 Gunicorn workers. Each Django worker holds database connections — size your RDS connection pool accordingly. With 10 tasks × 3 workers = 30 connections minimum.

For async workloads (FastAPI, Django with async views), Uvicorn with 1–2 workers handles far more concurrency than synchronous Gunicorn at equivalent CPU. Uvicorn’s event loop handles I/O-bound work efficiently.

Node.js

Node.js runs single-threaded by default. One Node process per ECS task is correct for containers. For CPU-bound workloads, use worker threads (worker_threads module) or run multiple tasks. Node.js is memory-efficient — a typical NestJS API uses 100–200MB per process at runtime. 512MB tasks are sufficient for most APIs.

Go

Go binaries are memory-light. A typical Go API server uses 30–80MB at runtime. 256MB tasks are viable. Go scales goroutines efficiently with a single process — unlike Node, it handles CPU parallelism natively with GOMAXPROCS set to container vCPU count.

Edge Cases and Failure Scenarios

Container Restart Loops

ECS marks a task unhealthy and stops it if health checks fail repeatedly. The most common causes of restart loops:

  1. Missing environment variables — Application fails to boot because a required secret was not injected. Fix: validate all secrets exist in Secrets Manager before deployment; add startup validation in your application bootstrap.

  2. Database migration failures during startup — Running php artisan migrate in the Docker ENTRYPOINT causes restart loops if the migration fails (table lock, bad SQL). Fix: run migrations as a pre-deployment ECS Task via CodePipeline, not in the container startup.

  3. Health check endpoint misconfiguration — The /health route requires database connectivity, which fails during cold start. Fix: make your health check endpoint return 200 immediately without downstream dependencies; use a separate /ready endpoint for readiness checks.

Memory Leak Detection

For PHP: watch the MemoryUtilization metric in CloudWatch. A gradual upward drift over 24 hours without corresponding traffic increase indicates a leak. Enable pm.max_requests in FPM and --memory limits on queue workers as circuit breakers.

For Node.js: use --max-old-space-size to set a hard limit matching your task memory. If Node hits the limit, it crashes with an OOM error. This is preferable to a slow memory leak causing task degradation — the container restarts and gets a clean state.

Zombie Worker Processes

PHP-FPM can accumulate zombie child processes if the master process crashes without cleaning up. In containers, PID 1 is your init process — if Nginx or FPM starts as PID 1 (rather than using an init wrapper), orphaned processes may not be reaped.

Use ENTRYPOINT ["tini", "--"] in your Dockerfile to run tini as PID 1. Tini handles zombie reaping and forwards signals correctly to child processes. For FrankenPHP, this is handled internally — the Go runtime manages child processes correctly.

For more on ECS container orchestration decisions, see our ECS vs EKS decision guide.

For auto-scaling your ECS services under variable load, see AWS Auto Scaling Strategies for EC2, ECS, and Lambda.

For CloudWatch observability configuration across your ECS cluster, see CloudWatch Observability: Metrics, Logs, and Alarms Best Practices.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »
AWS Auto Scaling Strategies: EC2, ECS, and Lambda

AWS Auto Scaling Strategies: EC2, ECS, and Lambda

A practical guide to AWS auto scaling — target tracking, step scaling, scheduled scaling, predictive scaling, and the strategies that balance performance, availability, and cost across EC2, ECS, and Lambda workloads.