Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

ECS CodeDeploy and Lambda aliases support both instant cutover and gradual shifts—but picking wrong costs you double Fargate spend or 21-day MTTR on muted alarms. This decision guide scores blue/green, canary, and rolling with a matrix and names App Mesh (EOL Sept 30, 2026) replacements.

Key Facts

  • ECS CodeDeploy and Lambda aliases support both instant cutover and gradual shifts—but picking wrong costs you double Fargate spend or 21-day MTTR on muted alarms
  • This decision guide scores blue/green, canary, and rolling with a matrix and names App Mesh (EOL Sept 30, 2026) replacements
  • May 2026
  • AWS documents progressive delivery through CodeDeploy on ECS (blue/green and canary traffic hooks) and Lambda (weighted aliases with canary/linear deployment configs in SAM and CDK)
  • For ECS click-path, read How to implement blue/green on ECS with CodeDeploy

Entity Definitions

Lambda
Lambda is an AWS service discussed in this article.
EC2
EC2 is an AWS service discussed in this article.
CloudWatch
CloudWatch is an AWS service discussed in this article.
VPC
VPC is an AWS service discussed in this article.
Amazon VPC
Amazon VPC is an AWS service discussed in this article.
EKS
EKS is an AWS service discussed in this article.
ECS
ECS is an AWS service discussed in this article.
Amazon ECS
Amazon ECS is an AWS service discussed in this article.

Blue/Green vs Canary on AWS (2026): ECS, Lambda, and When Rolling Is Enough

DevOps & CI/CD Palaniappan P 4 min read

Quick summary: ECS CodeDeploy and Lambda aliases support both instant cutover and gradual shifts—but picking wrong costs you double Fargate spend or 21-day MTTR on muted alarms. This decision guide scores blue/green, canary, and rolling with a matrix and names App Mesh (EOL Sept 30, 2026) replacements.

Key Takeaways

  • ECS CodeDeploy and Lambda aliases support both instant cutover and gradual shifts—but picking wrong costs you double Fargate spend or 21-day MTTR on muted alarms
  • This decision guide scores blue/green, canary, and rolling with a matrix and names App Mesh (EOL Sept 30, 2026) replacements
  • May 2026
  • AWS documents progressive delivery through CodeDeploy on ECS (blue/green and canary traffic hooks) and Lambda (weighted aliases with canary/linear deployment configs in SAM and CDK)
  • For ECS click-path, read How to implement blue/green on ECS with CodeDeploy
Blue/Green vs Canary on AWS (2026): ECS, Lambda, and When Rolling Is Enough
Table of Contents

May 2026. AWS documents progressive delivery through CodeDeploy on ECS (blue/green and canary traffic hooks) and Lambda (weighted aliases with canary/linear deployment configs in SAM and CDK). Teams still confuse instant cutover with gradual shift—and ship both in the same release as a database migration, then blame CodeDeploy when rollback cannot revert schema.

This is a decision guide, not a tutorial. For ECS click-path, read How to implement blue/green on ECS with CodeDeploy. For org-wide DevOps patterns, see 10 AWS DevOps practices for production.

Reference benchmark — API on ECS Fargate (6 tasks × 1 vCPU, 2 GiB), ~420 RPS peak, deploy window 25 minutes. Blue/green added ~$0.14 extra Fargate spend per deploy (double task count × half window)—negligible vs $9k/month steady state. A canary without alarms on the same service let 10% traffic hit a bad build for 5 minutes (~2.1M requests at peak) before support escalated; after wiring p95 latency + 5xx alarms, automatic rollback fired in under 90 seconds on the next bad release.

Definitions (AWS-native)

StrategyTraffic shapeRollback speedTypical AWS surface
Blue/green0% → 100% cutover (optional short bake)Seconds (re-weight ALB / alias)ECS CodeDeploy blue/green; Lambda alias swap
Canary / linear5–10% → stair-step to 100%Automatic if alarms configuredCodeDeploy canary/linear configs; SAM DeploymentPreference
RollingReplace tasks incrementallySlow (redeploy old task def)ECS minimumHealthyPercent / K8s rolling update

Opinionated take: Default revenue-facing APIs to canary with alarms; use blue/green when you need sub-minute rollback and can absorb 2× capacity during deploy; reserve rolling for internal tools with maintenance windows.

ECS on Fargate/EC2

Blue/green uses a second target group (green), health checks, then traffic shift. CodeDeploy can run PreTraffic / PostTraffic hooks—use them for synthetic checks, not manual Slack approval.

Canary on ECS uses CodeDeploy deployment configurations (Canary10Percent5Minutes, linear ramps, etc.) with optional CloudWatch alarms—same controller family as blue/green, different traffic schedule.

When NOT to combine strategies: Do not run blue/green task sets and a breaking Alembic/Flyway migration in one pipeline stage. Expand schema first, deploy backward-compatible code on canary, then contract schema in a later release.

Context line for snippets below: AWS CLI v2, ECS service with deploymentController: CODE_DEPLOY, Region us-east-1.

# List CodeDeploy deployment configs (ECS compute platform)
aws deploy list-deployment-configs --query "deploymentConfigsList[?contains(name, 'CodeDeployDefault')]"

Lambda

Production functions should use a published alias (live, prod)—never $LATEST for customer traffic.

NeedChoose
Fastest safe automationSAM/CDK DeploymentPreference → CodeDeploy canary/linear
Manual validation gateWeighted alias 5% → 100% with runbook
Instant revertPoint alias back to previous version ARN

AWS documents first-time gradual deploy as two steps: deploy with AutoPublishAlias, then add DeploymentPreference on subsequent releases (SAM gradual deployments guide).

Example SAM fragment (versions as of AWS SAM 1.x, May 2026):

# Functions must use an alias — not $LATEST — in production
MyApi:
  Type: AWS::Serverless::Function
  Properties:
    AutoPublishAlias: live
    DeploymentPreference:
      Type: Canary10Percent5Minutes
      Alarms:
        - !Ref ApiErrorRateAlarm
        - !Ref ApiLatencyAlarm

Rolling: when it is enough

Rolling ECS updates are valid when:

  • The service is internal (no customer SLA).
  • Changes are backward compatible and rehearsed in staging.
  • You accept minutes to roll forward/back—not seconds.

Rolling is not a substitute for observability on external APIs.

App Mesh deprecation (do not plan new mesh shifts)

AWS App Mesh is discontinued September 30, 2026; new customers cannot onboard after September 24, 2024 (migration blog). If you used App Mesh for traffic shifting:

  • ECS-onlyAmazon ECS Service Connect
  • EKS / cross-VPC / cross-accountAmazon VPC Lattice

An ECS service cannot be in App Mesh and Service Connect simultaneously—plan blue/green cutover to a parallel service definition, not an in-place mesh toggle.

Decision workflow

  1. Score workloads in examples/architecture-blog-2026/deployment-strategies/decision-matrix.md.
  2. Confirm alarms exist on the traffic-bearing metric (ALB 5xx, Lambda alias errors, p95 latency).
  3. Split schema changes from code changes in the pipeline.
  4. Pick ECS CodeDeploy config or SAM DeploymentPreference to match the matrix winner.

What broke — Team ran Canary10Percent5Minutes on Lambda without alias-scoped alarms (only $LATEST metrics). CloudWatch showed elevated errors on the alias dimension, but the alarm watched the wrong namespace; CodeDeploy completed the shift. Rollback required manual alias repoint—14 minutes customer impact. Fix: recreate alarms on FunctionName + Resource = alias; enable AutoRollbackConfiguration with STOP_ON_ALARM.

Reproduce this — Copy decision-matrix.md and validate against your last 5 incidents (deploy-related?). Cross-check SAM/CDK configs with CodeDeploy deployment configs.

What to do this week

  1. Inventory production deploys: % on alias vs $LATEST for Lambda.
  2. Add error + latency alarms wired to CodeDeploy rollback.
  3. Move breaking DB migrations out of the same stage as traffic shift.
  4. If still on App Mesh, open a migration epic to Service Connect / VPC Lattice before Q3 2026.

What this post does not cover

  • Full appspec.yaml and target-group wiring (ECS guide).
  • EKS/Argo/Flagger install paths (see DevOps practices post).
  • API Gateway canary settings (separate from compute canary—see API versioning guide).
  • Elastic Beanstalk rolling deploys (legacy pattern).

Related: Monolith to ECS zero-downtime migration · Production Laravel/Django/Node on ECS · AWS migration consulting

If you only do one thing: Wire rollback alarms to the alias or target group that actually receives customer traffic—then choose canary vs blue/green.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »