---
title: Blue/Green vs Canary on AWS (2026): ECS, Lambda, and When Rolling Is Enough
description: ECS CodeDeploy and Lambda aliases support both instant cutover and gradual shifts—but picking wrong costs you double Fargate spend or 21-day MTTR on muted alarms. This decision guide scores blue/green, canary, and rolling with a matrix and names App Mesh (EOL Sept 30, 2026) replacements.
url: https://www.factualminds.com/blog/aws-blue-green-vs-canary-deployment-decision-guide-2026/
datePublished: 2026-05-29T00:00:00.000Z
dateModified: 2026-05-29T00:00:00.000Z
author: Palaniappan P
category: DevOps & CI/CD
tags: codedeploy, ecs, lambda, deployments, devops, aws
---

# Blue/Green vs Canary on AWS (2026): ECS, Lambda, and When Rolling Is Enough

> ECS CodeDeploy and Lambda aliases support both instant cutover and gradual shifts—but picking wrong costs you double Fargate spend or 21-day MTTR on muted alarms. This decision guide scores blue/green, canary, and rolling with a matrix and names App Mesh (EOL Sept 30, 2026) replacements.

**May 2026.** AWS documents progressive delivery through **CodeDeploy** on **ECS** (blue/green and canary traffic hooks) and **Lambda** (weighted aliases with canary/linear deployment configs in SAM and CDK). Teams still confuse **instant cutover** with **gradual shift**—and ship both in the same release as a **database migration**, then blame CodeDeploy when rollback cannot revert schema.

This is a **decision guide**, not a tutorial. For ECS click-path, read [How to implement blue/green on ECS with CodeDeploy](/blog/how-to-implement-blue-green-deployments-ecs-codedeploy/). For org-wide DevOps patterns, see [10 AWS DevOps practices for production](/blog/10-aws-devops-practices-production-2026/).

> **Reference benchmark** — API on **ECS Fargate** (**6** tasks × **1 vCPU**, **2 GiB**), **~420 RPS** peak, deploy window **25** minutes. Blue/green added **~$0.14** extra Fargate spend per deploy (double task count × half window)—negligible vs **$9k**/month steady state. A canary without alarms on the same service let **10%** traffic hit a bad build for **5** minutes (**~2.1M** requests at peak) before support escalated; after wiring p95 latency + 5xx alarms, automatic rollback fired in **under 90** seconds on the next bad release.

## Definitions (AWS-native)

| Strategy | Traffic shape | Rollback speed | Typical AWS surface |
| -------- | ------------- | -------------- | ------------------- |
| **Blue/green** | 0% → 100% cutover (optional short bake) | Seconds (re-weight ALB / alias) | ECS CodeDeploy blue/green; Lambda alias swap |
| **Canary / linear** | 5–10% → stair-step to 100% | Automatic if alarms configured | CodeDeploy canary/linear configs; SAM `DeploymentPreference` |
| **Rolling** | Replace tasks incrementally | Slow (redeploy old task def) | ECS `minimumHealthyPercent` / K8s rolling update |

**Opinionated take:** Default **revenue-facing APIs** to **canary with alarms**; use **blue/green** when you need **sub-minute** rollback and can absorb 2× capacity during deploy; reserve **rolling** for internal tools with maintenance windows.

## ECS on Fargate/EC2

**Blue/green** uses a second target group (green), health checks, then traffic shift. CodeDeploy can run **PreTraffic** / **PostTraffic** hooks—use them for synthetic checks, not manual Slack approval.

**Canary** on ECS uses CodeDeploy deployment configurations (`Canary10Percent5Minutes`, linear ramps, etc.) with optional CloudWatch alarms—same controller family as blue/green, different traffic schedule.

**When NOT to combine strategies:** Do not run blue/green task sets **and** a breaking Alembic/Flyway migration in one pipeline stage. Expand schema **first**, deploy backward-compatible code on canary, then contract schema in a later release.

Context line for snippets below: **AWS CLI v2**, **ECS service with `deploymentController: CODE_DEPLOY`**, Region **us-east-1**.

```bash
# List CodeDeploy deployment configs (ECS compute platform)
aws deploy list-deployment-configs --query "deploymentConfigsList[?contains(name, 'CodeDeployDefault')]"
```

## Lambda

Production functions should use a **published alias** (`live`, `prod`)—never `$LATEST` for customer traffic.

| Need | Choose |
| ---- | ------ |
| Fastest safe automation | SAM/CDK `DeploymentPreference` → CodeDeploy canary/linear |
| Manual validation gate | Weighted alias 5% → 100% with runbook |
| Instant revert | Point alias back to previous version ARN |

AWS documents first-time gradual deploy as **two steps**: deploy with `AutoPublishAlias`, then add `DeploymentPreference` on subsequent releases ([SAM gradual deployments guide](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/automating-updates-to-serverless-apps.html)).

Example SAM fragment (versions as of **AWS SAM 1.x**, **May 2026**):

```yaml
# Functions must use an alias — not $LATEST — in production
MyApi:
  Type: AWS::Serverless::Function
  Properties:
    AutoPublishAlias: live
    DeploymentPreference:
      Type: Canary10Percent5Minutes
      Alarms:
        - !Ref ApiErrorRateAlarm
        - !Ref ApiLatencyAlarm
```

## Rolling: when it is enough

Rolling ECS updates are valid when:

- The service is **internal** (no customer SLA).
- Changes are **backward compatible** and rehearsed in staging.
- You accept **minutes** to roll forward/back—not seconds.

Rolling is **not** a substitute for observability on external APIs.

## App Mesh deprecation (do not plan new mesh shifts)

**AWS App Mesh** is discontinued **September 30, 2026**; new customers cannot onboard after **September 24, 2024** ([migration blog](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-vpc-lattice/)). If you used App Mesh for traffic shifting:

- **ECS-only** → **Amazon ECS Service Connect**
- **EKS / cross-VPC / cross-account** → **Amazon VPC Lattice**

An ECS service cannot be in App Mesh and Service Connect simultaneously—plan **blue/green** cutover to a parallel service definition, not an in-place mesh toggle.

## Decision workflow

1. Score workloads in [`examples/architecture-blog-2026/deployment-strategies/decision-matrix.md`](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/deployment-strategies/decision-matrix.md).
2. Confirm **alarms** exist on the traffic-bearing metric (ALB 5xx, Lambda alias errors, p95 latency).
3. Split **schema** changes from **code** changes in the pipeline.
4. Pick ECS CodeDeploy config or SAM `DeploymentPreference` to match the matrix winner.

> **What broke** — Team ran **Canary10Percent5Minutes** on Lambda without alias-scoped alarms (only `$LATEST` metrics). CloudWatch showed elevated errors on the alias dimension, but the alarm watched the wrong namespace; CodeDeploy completed the shift. Rollback required manual alias repoint—**14** minutes customer impact. Fix: recreate alarms on `FunctionName` + `Resource` = alias; enable `AutoRollbackConfiguration` with `STOP_ON_ALARM`.

> **Reproduce this** — Copy [`decision-matrix.md`](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/deployment-strategies/decision-matrix.md) and validate against your last **5** incidents (deploy-related?). Cross-check SAM/CDK configs with [CodeDeploy deployment configs](https://docs.aws.amazon.com/codedeploy/latest/userguide/deployment-configurations.html).

## What to do this week

1. Inventory production deploys: % on **alias** vs `$LATEST` for Lambda.
2. Add **error + latency** alarms wired to CodeDeploy rollback.
3. Move breaking DB migrations **out** of the same stage as traffic shift.
4. If still on **App Mesh**, open a migration epic to Service Connect / VPC Lattice before **Q3 2026**.

## What this post does not cover

- Full **appspec.yaml** and target-group wiring ([ECS guide](/blog/how-to-implement-blue-green-deployments-ecs-codedeploy/)).
- **EKS/Argo/Flagger** install paths (see DevOps practices post).
- **API Gateway** canary settings (separate from compute canary—see [API versioning guide](/blog/aws-http-websocket-api-versioning/)).
- **Elastic Beanstalk** rolling deploys (legacy pattern).

---

**Related:** [Monolith to ECS zero-downtime migration](/blog/how-to-migrate-monolith-ecs-fargate-zero-downtime/) · [Production Laravel/Django/Node on ECS](/blog/production-laravel-django-node-on-ecs-2026/) · [AWS migration consulting](/services/aws-migration/)

**If you only do one thing:** Wire **rollback alarms** to the alias or target group that actually receives customer traffic—then choose canary vs blue/green.

## FAQ

### When should we NOT use blue/green on ECS?
Skip blue/green when you cannot afford ~2× task capacity during the deploy window, when changes are config-only with a maintenance window, or when database schema migrations ship in the same release without expand/contract discipline. Rolling updates with conservative minimumHealthyPercent are acceptable for internal tools—not for revenue APIs without alarms.

### When should we NOT use canary?
Do not use canary without CloudWatch alarms on error rate and latency tied to CodeDeploy rollback—canary without observability is theater. Also avoid coupling canary traffic shifts with breaking schema changes; deploy schema forward-compatible changes first, then code. If your team invokes Lambda via $LATEST, fix that before any gradual deployment.

### What goes wrong if canary runs without alarms?
Traffic shifts to 10% on the new version, errors spike, but no alarm triggers rollback—users on the canary slice see failures for the full interval (e.g., 5–30 minutes). SOCs later discover the issue from support tickets, not deploy automation. Fix: error-rate and duration alarms on the alias dimension with CodeDeploy STOP_ON_ALARM.

### How is Lambda blue/green different from canary?
Blue/green on Lambda is typically manual or scripted: two aliases or weighted cutover from 0%→100% on a new version when validation passes. Canary uses CodeDeploy deployment configs (e.g., Canary10Percent5Minutes or linear ramps) with automatic rollback. All production traffic should hit a published alias—not $LATEST.

### Does this replace the ECS CodeDeploy how-to?
No. Use this article to choose a strategy; follow our [ECS blue/green implementation guide](/blog/how-to-implement-blue-green-deployments-ecs-codedeploy/) for appspec, target groups, and hooks. For GitOps canary on Kubernetes, see the Flagger section in [10 AWS DevOps practices](/blog/10-aws-devops-practices-production-2026/).

### We still use App Mesh for traffic shifting—what now?
AWS App Mesh reaches end of life September 30, 2026 ([AWS announcement](https://aws.amazon.com/blogs/containers/migrating-from-aws-app-mesh-to-amazon-vpc-lattice/)). New customers cannot onboard since September 24, 2024. For ECS, migrate to ECS Service Connect; for EKS/cross-VPC, use VPC Lattice. Do not plan new mesh-based canary paths on App Mesh.

---

*Source: https://www.factualminds.com/blog/aws-blue-green-vs-canary-deployment-decision-guide-2026/*
