AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

CI/CD infrastructure is invisible until your DevOps bill hits $15,000/month. Build minutes, artifact storage, and ephemeral environments accumulate costs that few teams track. Here is how to measure and control them.

Key Facts

  • CI/CD infrastructure is invisible until your DevOps bill hits $15,000/month
  • CI/CD infrastructure is invisible until your DevOps bill hits $15,000/month

Entity Definitions

CI/CD
CI/CD is a cloud computing concept discussed in this article.
DevOps
DevOps is a cloud computing concept discussed in this article.

How to Build Cost-Aware CI/CD Pipelines on AWS

DevOps & CI/CD Palaniappan P 14 min read

Quick summary: CI/CD infrastructure is invisible until your DevOps bill hits $15,000/month. Build minutes, artifact storage, and ephemeral environments accumulate costs that few teams track. Here is how to measure and control them.

Key Takeaways

  • CI/CD infrastructure is invisible until your DevOps bill hits $15,000/month
  • CI/CD infrastructure is invisible until your DevOps bill hits $15,000/month
How to Build Cost-Aware CI/CD Pipelines on AWS
Table of Contents

Most engineering teams treat CI/CD infrastructure as a fixed cost. It runs in the background, engineers trigger builds, deployments happen. The bill comes in and the line item reads “CodeBuild” or “GitHub Actions” — nobody investigates until it reaches $10,000 per month.

By that point, there are months of accumulated waste: builds without timeouts running for hours, artifact buckets with no lifecycle rules holding a year of stale binaries, shared staging environments running around the clock for a team that works one time zone, ephemeral test environments that were never destroyed.

This post maps every cost driver in a typical AWS CI/CD stack, gives you exact numbers for comparing build environments, and shows you how to put cost gates in your pull request workflow before the bill surprises you.

The CI/CD Cost Anatomy

Before optimizing, you need to know where money goes. A representative CI/CD stack on AWS touches five cost categories, and teams typically track only one of them (build compute).

Build Compute

This is the most visible cost. AWS CodeBuild charges per build minute: $0.005/minute for general1.small (3 GB RAM, 2 vCPU), scaling to $0.05/minute for gpu1.large. GitHub Actions hosted runners charge $0.008/minute for the standard Linux runner (2 vCPU, 7 GB RAM).

A team running 500 builds per day, averaging 8 minutes each, burns 4,000 build minutes daily. On CodeBuild general1.medium: 4,000 × $0.010 = $40/day = $1,200/month. On GitHub Actions hosted: 4,000 × $0.008 = $32/day = $960/month. These numbers feel manageable — until one misconfigured job starts running 45-minute builds.

Artifact Storage

S3 costs $0.023/GB/month for standard storage. A team building Docker images, Lambda deployment packages, test reports, and Terraform state accumulates storage fast. A typical production deployment package is 50–200 MB. Building 500 packages per day without a retention policy: 500 × 100 MB = 50 GB/day. After 30 days: 1,500 GB = $34.50/month. After 90 days without cleanup: $103/month in artifact storage alone, plus $0.0004/GET and $0.005/PUT API costs for frequent reads.

ECR Container Registry Storage

ECR charges $0.10/GB/month. This is more expensive per GB than S3, and without lifecycle policies, every image push is permanent. An active team building Docker images multiple times per day accumulates this fast. We cover the exact growth math in the lifecycle policy section below.

NAT Gateway for Private Runners

This is the cost that blindsides teams the most. If your self-hosted runners or CodeBuild projects run inside a VPC (required for accessing private resources like RDS, internal APIs, or private ECR), every outbound internet request goes through NAT Gateway.

NAT Gateway costs: $0.045/hour per gateway ($32.40/month) plus $0.045/GB of processed data. A build that pulls a 500 MB base Docker image from Docker Hub, downloads npm packages, and fetches dependencies from public endpoints processes roughly 1–2 GB of outbound NAT traffic. At 500 builds per day: 500 × 1.5 GB × $0.045 = $33.75/day = $1,012/month in NAT Gateway data charges alone.

The mitigation: use ECR Public for base images (no NAT cost from within AWS), configure S3 Gateway endpoints (free), and use VPC Interface endpoints for ECR and other AWS services that build processes access frequently.

ECS Deploy Costs

Rolling deployments on ECS trigger new task launches and old task drains. During a deployment, you temporarily run double the task count. For an application running 10 tasks at $0.04048/vCPU-hour and $0.004445/GB-hour (Fargate pricing), a 10-minute deployment window at double capacity costs roughly $0.013 per deployment. Harmless at 10 deployments per day ($0.13/day), but ECS-heavy pipelines that deploy per commit across 20 services add up.

CodeBuild vs GitHub Actions: The Real Cost Comparison

The theoretical cost per minute comparison misses important factors. Here is the full comparison including practical considerations.

CodeBuild Spot Fleets

AWS CodeBuild now supports Spot instance fleets for build compute, reducing costs by 50–90% compared to on-demand. A CodeBuild fleet of c6i.large Spot instances (2 vCPU, 4 GB) runs at roughly $0.025–0.035/hour for the underlying compute, compared to $0.14/hour on-demand. For the same 4,000 build minutes per day: Spot fleet cost ≈ $16–22/day vs on-demand CodeBuild $40/day.

The Spot interruption caveat: CodeBuild Spot fleets handle interruptions by retrying builds automatically. For builds under 30 minutes, Spot interruption rates are typically low enough (under 5%) that the cost savings outweigh retry overhead.

GitHub Actions Hosted vs Self-Hosted on Spot

GitHub Actions hosted runners cost $0.008/minute for the standard 2-vCPU Linux runner. For a 2 vCPU build: $0.008/min × 60 = $0.48/hour.

Self-hosted on EC2 Spot c6i.large (2 vCPU, 4 GB): $0.025–0.035/hour for compute, plus:

  • Actions Runner Controller (ARC) on EKS: adds $50–100/month for the control plane
  • Or a simpler ASG-based setup: minimal overhead, $5–10/month for the Lambda function that triggers scale-out

Break-even calculation: $0.008/min hosted × 10,000 min/month = $80/month from GitHub. Self-hosted: $30–35/month compute + $10/month orchestration = $40–45/month. Self-hosted is cheaper above ~6,000 minutes/month with the simple ASG approach.

The self-hosted advantage at scale: you control instance type (go larger for parallel tests without the hosted runner large runner premium), can use instance store NVMe for faster builds, and cache Docker layers on a persistent EBS volume attached to the runner ASG.

The Numbers at Scale

Monthly Build MinutesGitHub HostedCodeBuild On-DemandCodeBuild SpotSelf-Hosted Spot (ARC)
5,000$40$50$15–20$45–55
20,000$160$200$50–70$80–100
50,000$400$500$120–170$150–180
100,000$800$1,000$240–340$250–280

At 100,000 minutes/month, CodeBuild Spot is roughly equivalent to self-hosted Spot (both ~70% cheaper than hosted), but CodeBuild Spot requires zero orchestration infrastructure.

Artifact Storage Strategies

S3 Lifecycle Policies for Build Artifacts

Build artifacts have a clear half-life. You need the last N builds for rollback, and nothing before that. An S3 lifecycle policy that enforces this:

  • Current production artifacts: retain the last 30 days
  • Non-production artifacts (PR builds, branch builds): retain 7 days
  • Test reports and coverage artifacts: retain 14 days
  • Terraform state: no expiry (state files are tiny; this is the wrong place to save money)

For CodePipeline artifact buckets, set a lifecycle rule on the codepipeline-artifacts-* bucket that transitions to S3 Intelligent-Tiering after 30 days and expires after 90 days.

ECR Lifecycle Policy Terraform

Here is the ECR lifecycle policy that keeps your registry clean:

resource "aws_ecr_lifecycle_policy" "app" {
  repository = aws_ecr_repository.app.name

  policy = jsonencode({
    rules = [
      {
        rulePriority = 1
        description  = "Expire untagged images after 1 day"
        selection = {
          tagStatus   = "untagged"
          countType   = "sinceImagePushed"
          countUnit   = "days"
          countNumber = 1
        }
        action = {
          type = "expire"
        }
      },
      {
        rulePriority = 2
        description  = "Keep last 10 tagged images"
        selection = {
          tagStatus     = "tagged"
          tagPrefixList = ["v", "release-", "prod-"]
          countType     = "imageCountMoreThan"
          countNumber   = 10
        }
        action = {
          type = "expire"
        }
      },
      {
        rulePriority = 3
        description  = "Keep last 5 images for any other tag"
        selection = {
          tagStatus   = "tagged"
          countType   = "imageCountMoreThan"
          countNumber = 5
        }
        action = {
          type = "expire"
        }
      }
    ]
  })
}

resource "aws_ecr_repository" "app" {
  name                 = var.repository_name
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "AES256"
  }

  tags = var.tags
}

Rules execute in priority order. The untagged expiry (priority 1) runs first, cleaning up images superseded by new pushes. Priority 2 keeps the last 10 production-tagged images for rollback. Priority 3 is a catch-all for development tags.

Apply this policy to every repository. ECR has no global lifecycle policy — repositories created without a policy accumulate images indefinitely.

CodeBuild Spot Fleet Terraform

For teams using CodeBuild directly rather than GitHub Actions, a Spot instance fleet reduces build costs significantly:

resource "aws_codebuild_fleet" "spot" {
  name          = "${var.project_name}-spot-fleet"
  base_capacity = 1
  compute_type  = "BUILD_GENERAL1_SMALL"
  environment_type = "LINUX_CONTAINER"

  scaling_configuration {
    max_capacity       = 10
    scaling_type       = "TARGET_TRACKING_SCALING"
    target_tracking_scaling_configs {
      metric_type  = "FLEET_UTILIZATION_RATE"
      target_value = 0.8
    }
  }

  fleet_service_role = aws_iam_role.codebuild_fleet.arn

  overflow_behavior = "QUEUE"

  tags = var.tags
}

resource "aws_codebuild_project" "app" {
  name          = var.project_name
  service_role  = aws_iam_role.codebuild.arn

  artifacts {
    type = "S3"
    location = aws_s3_bucket.artifacts.bucket
    packaging = "ZIP"
  }

  environment {
    compute_type                = "BUILD_GENERAL1_SMALL"
    image                       = "aws/codebuild/standard:7.0"
    type                        = "LINUX_CONTAINER"
    image_pull_credentials_type = "CODEBUILD"

    fleet {
      fleet_arn = aws_codebuild_fleet.spot.arn
    }
  }

  source {
    type      = "GITHUB"
    location  = var.github_repo_url
    buildspec = "buildspec.yml"
  }

  # Critical: always set a build timeout
  build_timeout = 30  # minutes

  vpc_config {
    vpc_id             = var.vpc_id
    subnets            = var.private_subnet_ids
    security_group_ids = [aws_security_group.codebuild.id]
  }

  tags = var.tags
}

The build_timeout = 30 is not optional. Without a timeout, a hung build runs indefinitely and you pay for every minute. The default CodeBuild timeout is 60 minutes; a reasonable value for most builds is 15–30 minutes.

Infracost Integration in Pull Requests

Infracost converts Terraform plan output into a cost estimate and posts it as a PR comment. The workflow below runs on every PR that touches Terraform files:

name: Infracost PR Cost Estimate

on:
  pull_request:
    paths:
      - 'infra/**'
      - 'terraform/**'
      - '*.tf'

permissions:
  contents: read
  pull-requests: write

jobs:
  infracost:
    name: Estimate infrastructure cost change
    runs-on: ubuntu-latest

    steps:
      - name: Checkout base branch
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.base.ref }}
          path: base

      - name: Checkout PR branch
        uses: actions/checkout@v4
        with:
          path: pr

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: '1.7.0'
          terraform_wrapper: false

      - name: Setup Infracost
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}

      - name: Generate base cost estimate
        run: |
          cd base/terraform
          terraform init -backend=false
          infracost breakdown \
            --path=. \
            --format=json \
            --out-file=/tmp/infracost-base.json
        env:
          AWS_DEFAULT_REGION: us-east-1

      - name: Generate PR cost estimate
        run: |
          cd pr/terraform
          terraform init -backend=false
          infracost breakdown \
            --path=. \
            --format=json \
            --out-file=/tmp/infracost-pr.json
        env:
          AWS_DEFAULT_REGION: us-east-1

      - name: Generate cost diff
        run: |
          infracost diff \
            --path=/tmp/infracost-pr.json \
            --compare-to=/tmp/infracost-base.json \
            --format=json \
            --out-file=/tmp/infracost-diff.json

      - name: Post cost comment to PR
        run: |
          infracost comment github \
            --path=/tmp/infracost-diff.json \
            --repo=$GITHUB_REPOSITORY \
            --github-token=${{ secrets.GITHUB_TOKEN }} \
            --pull-request=${{ github.event.pull_request.number }} \
            --behavior=update
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Cost gate — block PR if monthly increase exceeds $500
        run: |
          DIFF=$(cat /tmp/infracost-diff.json | jq '.diffTotalMonthlyCost | tonumber')
          echo "Monthly cost change: $${DIFF}"
          if (( $(echo "$DIFF > 500" | bc -l) )); then
            echo "::error::Monthly cost increase ($${DIFF}) exceeds the $500 threshold. Review and justify before merging."
            exit 1
          fi

The cost gate at the end blocks the PR if the monthly cost increase from this change exceeds a threshold. Adjust 500 to match your team’s risk tolerance. The gate catches resource size changes, new environment creation, and storage class misconfigurations before they reach production.

The --backend=false flag on terraform init prevents the workflow from reading or modifying remote state — it only performs a static analysis of the configuration. This means some cost estimates will be incomplete (resources that require state lookup), but it keeps the workflow stateless and simple.

Ephemeral vs Shared Staging: The Real Cost Analysis

Shared staging environments are the default because they are simple: one environment, one set of infrastructure, one URL to give to QA. The problem is they run 24/7, they create contention between engineers working on different features, and they accumulate configuration drift from manual testing.

A representative staging stack (web app with database, cache, and queue):

  • 2× ECS Fargate tasks (1 vCPU, 2 GB): $0.08/hour
  • RDS db.t4g.medium: $0.065/hour
  • ElastiCache cache.t4g.micro: $0.017/hour
  • ALB: $0.008/hour + $0.008/LCU-hour
  • Total: ~$0.18/hour = $130/month

For a 20-engineer team, 5 PRs active simultaneously:

  • Ephemeral: 5 × $0.18/hour × 8 hours active = $7.20/day = $216/month
  • Shared staging: $130/month

At 5 concurrent PRs, shared staging is cheaper. At 10 concurrent PRs: ephemeral = $432/month vs shared = $130/month. Shared staging wins on pure cost.

The real case for ephemeral: engineering time. Staging conflicts (two engineers deploying incompatible changes) cost an hour each. At 3 conflicts per week with a 10-person team, that is 3 engineering hours per week. At $100/hour fully loaded: $1,200/month in lost productivity. Ephemeral environments at $432/month are cheaper than the coordination overhead of shared staging at that team size.

The practical approach: use Terraform workspaces to create PR environments, set auto_destroy when the PR closes, and limit the ephemeral environment to lightweight components (exclude the database if test data can be loaded from fixtures into a shared RDS instance).

Terraform-Specific Cost Patterns

State Backend Costs

Terraform state stored in S3 with DynamoDB locking is cheap: $0.023/GB for state files (typically < 1 MB each) plus DynamoDB $0.00065/WCU for lock writes. A team with 20 Terraform workspaces doing 50 plan/apply operations per day: DynamoDB writes ≈ $0.03/day. Negligible.

The backend costs that matter: if you use Terraform Cloud or HCP Terraform for remote runs, check that the team tier matches your usage. Terraform Cloud free tier includes 500 apply hours per month; above that, the $20/user/month tier is typically justified but needs to be tracked.

Plan Caching for Faster Pipelines

terraform plan output can be saved and reused within a pipeline, avoiding duplicate planning when the same plan feeds both cost estimation and apply:

- name: Terraform Plan
  run: terraform plan -out=tfplan.binary

- name: Show plan as JSON (for Infracost)
  run: terraform show -json tfplan.binary > tfplan.json

- name: Infracost from plan
  run: infracost breakdown --path=tfplan.json --format=json > infracost.json

The cached plan file is used by both the apply step and the cost estimation step. This eliminates the second terraform plan call that many teams add for cost estimation, saving both time and any plan-time API calls to AWS.

Drift Detection Without Continuous Polling

terraform plan run on a schedule detects configuration drift (infrastructure changed outside Terraform). The cost trap: running plan against every workspace hourly generates significant API calls and potential read costs for large state files. Run drift detection daily during off-peak hours, not continuously. A Lambda function on a CloudWatch Events schedule that triggers a CodeBuild job to run terraform plan across workspaces works well and costs pennies.

Edge Cases That Generate Unexpected Bills

Runaway Pipelines Without Timeouts

A CI/CD job that hangs indefinitely is the most common source of unexpected bills. A GitHub Actions job without a timeout-minutes property runs for up to 6 hours by default. A CodeBuild build runs for up to 8 hours by default. One stuck job per day for a month: 720 hours × compute_cost.

Always set explicit timeouts:

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 20  # Kill after 20 minutes regardless

Set timeouts at two levels: the job level and the individual step level for long-running steps.

Recursive Trigger Patterns

A pipeline that deploys infrastructure which triggers another pipeline is a recursive trigger. Common pattern: Pipeline A deploys a Lambda, which sends an SNS message, which triggers Pipeline B (via EventBridge), which deploys another Lambda, which… The bill for recursive pipelines can reach thousands of dollars before anyone notices.

Guard against recursion with pipeline conditions: check that the triggering event is a human commit, not an automated system. In GitHub Actions:

on:
  push:
    branches: [main]

jobs:
  deploy:
    if: github.actor != 'github-actions[bot]' && github.actor != 'dependabot[bot]'

Artifact Retention Without Lifecycle Rules

CodePipeline creates an S3 bucket automatically (codepipeline-{region}-{accountid}) and stores pipeline artifacts there. By default, no lifecycle policy is applied. For a pipeline that runs 100 times per day with 50 MB artifacts per run: 5 GB/day, 150 GB/month growth, $3.45/month added monthly. After 12 months: $41.40/month for a pipeline nobody is looking at.

Set a lifecycle policy on CodePipeline artifact buckets explicitly — they are not managed by Terraform by default and AWS does not set policies on auto-created buckets.

NAT Gateway for Private Codebuild

The $0.045/GB NAT Gateway charge is easy to underestimate. A build that fetches a large Docker base image, downloads 200 npm packages, and runs pip install -r requirements.txt can easily process 2–4 GB of outbound NAT traffic. For 200 builds per day: 200 × 3 GB × $0.045 = $27/day = $810/month.

Mitigation checklist:

  1. Add an S3 Gateway endpoint (free) — eliminates S3 traffic through NAT
  2. Add ECR VPC Interface endpoint ($0.01/hour = $7.20/month) — eliminates ECR pulls through NAT; breakeven at ~160 GB of ECR traffic/month
  3. Mirror frequently-used public Docker images to ECR (one-time cost, eliminates Docker Hub pulls through NAT)
  4. Cache npm/pip packages in S3 (CodeBuild caching feature, no NAT cost once cached)

Putting It Together: Cost Visibility in Practice

The teams that control CI/CD costs have two things in common: they tag every CI/CD resource (CodeBuild projects, S3 buckets, ECR repositories) with a consistent tag structure, and they review the CI/CD line item in Cost Explorer weekly.

Tag structure for CI/CD resources:

tags = {
  Team        = "platform"
  CostCenter  = "engineering-infrastructure"
  Environment = "cicd"
  ManagedBy   = "terraform"
}

With consistent tags, you can filter Cost Explorer to show exactly what CI/CD costs per week, spot anomalies (a spike on a specific day indicating a runaway build), and allocate costs to teams via cost allocation tags.

The Infracost integration ensures that infrastructure cost increases are visible at PR time — before they reach production. Combined with build timeouts, ECR lifecycle policies, and S3 artifact lifecycle rules, these controls keep CI/CD costs proportional to engineering output rather than growing unchecked.

For related reading on securing your CI/CD pipelines alongside controlling costs, see our guides on GitHub Actions AWS CI/CD security best practices and AWS CodePipeline CI/CD pipeline patterns for production. For the infrastructure-as-code decisions that feed these pipelines, see Terraform vs AWS CDK.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »