---
title: AWS Observability Costs: Cardinality Budgets & FinOps Limits
description: CloudWatch Logs Insights bills $0.005 per GB scanned and high-cardinality custom metrics multiply costs. Cardinality budgets, sampling rules, and FinOps fixes.
url: https://www.factualminds.com/blog/aws-observability-finops-cardinality-cost-control/
datePublished: 2026-05-08T00:00:00.000Z
dateModified: 2026-05-15T00:00:00.000Z
author: Palaniappan P
category: Cloud Architecture
tags: amazon-cloudwatch, observability, opentelemetry, aws-xray, finops, cost-optimization
---

# AWS Observability Costs: Cardinality Budgets & FinOps Limits

> CloudWatch Logs Insights bills $0.005 per GB scanned and high-cardinality custom metrics multiply costs. Cardinality budgets, sampling rules, and FinOps fixes.

On **May 8, 2026**, **Amazon CloudWatch** observability bills bite hardest where teams least audit them: **Logs Insights** at **$0.005 per GB scanned** (US East list pricing) and **custom metrics** that explode when a per-user-ID dimension lands on a hot path. A "cheap" wide query scheduled every minute, or a `PutMetricData` storm on a high-cardinality dimension, becomes a five-figure monthly line item before finance reads the dashboard.

This guide is the **FinOps view** of AWS observability — **cardinality budgets**, **sampling rules**, and **query hygiene** that keep CloudWatch and OpenTelemetry costs predictable. For the broader CloudWatch capability surface (metrics, logs, alarms, Application Signals), see [CloudWatch observability best practices](/blog/aws-cloudwatch-observability-metrics-logs-alarms-best-practices/). For full FinOps remediation across compute and observability, see our [AWS cloud cost optimization services](/services/aws-cloud-cost-optimization-services/).

> **Reproduce this** — Paste starter queries from [`examples/architecture-blog-2026/observability/logs-insights-queries.txt`](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability/) into Logs Insights; replace log groups and trace IDs.

## The three cost surfaces — metrics, logs, traces

- **Custom metrics** bill **per dimension combination**. Adding `user_id` to a metric with 500k DAU creates 500k time series — that is not a metric, that is a database.
- **Logs Insights** bills **per GB scanned** ($0.005/GB US East, May 2026). A `SELECT *` across 30 log groups for "the last hour" is bigger than people think.
- **Traces** bill **per-span ingested** in **AWS X-Ray**; without sampling rules, instrument-everything day one is the most expensive learning experience your team will have.

## Cardinality budgets

Cardinality is the only observability concept where a single mis-applied tag silently triples your bill.

- Set a **budget** per service: max distinct dimension combinations per metric, max distinct attributes per trace span. Review weekly.
- Prefer **aggregated** business metrics with bounded dimensions (region, env, tier) over per-user-ID metrics. Use **traces** for per-user debugging, not metrics.
- Use **embedded metric format (EMF)** carefully — EMF generates metrics from log lines, which means a runaway log can mint runaway metrics.

> **What broke** — A "temporary" debugging query (`SELECT *` across 30 log groups) landed in a CI cron for weeks. Finance flagged an **$18k** monthly delta — logs were "cheap" until automation scaled humans out of the loop. Fix: query linting, required group filters, and budget alarms per log product.

## Sampling — the lever most teams skip

- **Head-based sampling** (sample at trace start) is simple and predictable; default to it before tail-based.
- **Tail-based sampling** requires a collector tier (**AWS Distro for OpenTelemetry**, OTel Collector). Useful for keeping all error traces while dropping normal ones — _if_ the rules do not accidentally keep 100% of "normal" anyway.
- **Validate trace counts against billing weekly.** Misconfigured rules look correct in code and silently re-include the world.

## OpenTelemetry adoption pacing

Half-instrumented OpenTelemetry can cost more than central **AWS X-Ray** with known sampling rules. Phase adoption: critical paths first, tail-based sampling second, broad rollout last. For chaos-driven OTel learning, see [OTel Demo Game](/blog/otel-demo-game-aws-observability-chaos-engineering/).

## Monitoring vs alerting — the cost link

**Monitoring** collects continuously; **alerting** routes to humans. The cost link: every alarm that fires more than twice a week without an action update is a candidate for deletion or re-tiering. Noisy alarms keep their underlying metric pipelines hot and humans paged.

> **Opinionated take** — If an alert fires more than twice a week without an action checklist update, delete or re-tier the alarm — noisy paging is a security incident waiting for ignored fatigue.

Training wheels: [real cost of skipping 24/7 monitoring](/blog/real-cost-of-no-24-7-aws-monitoring/) quantifies the business risk of silent failure. Operational mantra from our [distributed debugging guide](/blog/debug-production-distributed-aws-systems/): **alert on metrics**, **triage with traces**, **diagnose with logs**.

## FinOps coupling

Observability spend belongs in the same review as compute. Cross-references: [engineering cost ownership](/blog/aws-finops-gap-engineering-cost-ownership/) and [logging cost deep dive](/blog/aws-cloudwatch-logging-costs-observability/).

## What This Post Doesn't Cover

- **Third-party APM** pricing negotiations — vendor-specific.
- **Security analytics** in Security Lake vs CloudWatch — different retention/compliance story ([Security Lake guide](/blog/amazon-security-lake-ocsf/) when OCSF normalization matters).
- **Capability deep dive** on metrics, logs, alarms, Application Signals — that lives in [CloudWatch observability best practices](/blog/aws-cloudwatch-observability-metrics-logs-alarms-best-practices/).

## If You Only Do One Thing

Set a **per-service cardinality budget** and review it weekly with finance. Every other observability cost lever flows from that one number.

## What to Do This Week

1. Inventory scheduled Logs Insights queries; delete or constrain time windows.
2. Audit every `PutMetricData` call for per-user-ID or per-request-ID dimensions; remove or move to traces.
3. Verify **W3C trace context** crosses service boundaries in staging (not just Lambda console traces).
4. Pair every new alarm with a **runbook link** or delete the alarm — no orphan pages.

When resilience patterns intersect telemetry, read [retries, circuits, and graceful shutdown](/blog/aws-resilience-retries-circuits-graceful-shutdown/).

## FAQ

### When should we avoid high-cardinality custom metrics in CloudWatch?
When dimensions explode (per-user IDs on hot paths) and billing scales nonlinearly. Prefer aggregated business metrics, sampling, or trace-derived insights for per-tenant debugging—not raw PutMetricData storms.

### What is the biggest observability cost trap on AWS?
Scheduled Logs Insights queries on wide time windows. List pricing is $0.005 per GB scanned (US East, May 2026); a "cheap" minute-by-minute query across many log groups becomes a five-figure monthly line item before finance flags it. Constrain time windows, narrow log groups, and require lint approval on scheduled queries.

### Is logging alone enough for production AWS systems?
Logs without correlation identifiers devolve into forensic archaeology across log groups. You need at least one of: structured trace IDs, correlated metrics, or traces. Our distributed debugging guide shows the composite workflow.

### When should we not adopt OpenTelemetry everywhere day one?
When teams lack collector discipline and cardinality budgets—half-instrumented OTel can cost more than central X-Ray with known sampling rules. Phase adoption: critical paths first, tail-based sampling second.

### What breaks tail-based sampling budgets?
Misconfigured rules that still capture 100% of "normal" traffic while claiming tail sampling—validate trace counts against billing weekly.

### How is this different from CloudWatch best practices coverage?
The companion CloudWatch observability best practices post covers the capability surface—metrics, logs, alarms, Application Signals. This post is the FinOps view: cardinality budgets, sampling rules, query hygiene, and the cost behaviors that make observability bills predictable.

---

*Source: https://www.factualminds.com/blog/aws-observability-finops-cardinality-cost-control/*
