Prometheus Cardinality Explosion on AWS: AMP, EMF, and Cost-Aware Metrics
Quick summary: That `user_id` label on every HTTP metric turns Amazon Managed Prometheus into a five-figure line item. This guide explains cardinality mechanics, EMF vs remote write, and Application Signals defaults worth disabling.
Key Takeaways
- That `user_id` label on every HTTP metric turns Amazon Managed Prometheus into a five-figure line item
- Amazon Managed Prometheus (AMP) pricing (June 2026) scales with metrics ingested and stored—cardinality is dollars
- Benchmark pattern — OTel demo workload on EKS: enabling with raw URL paths pushed active series from 12k → 890k in 6 h; AMP estimate +$2,400/mo
- Relabel to template routes ( ) restored 14k series
- See observability beyond CloudWatch for stack wiring
Table of Contents
Amazon Managed Prometheus (AMP) pricing (June 2026) scales with metrics ingested and stored—cardinality is dollars. A single histogram with path label including UUIDs can create millions of active series within hours.
Benchmark pattern — OTel demo workload on EKS: enabling
http.routewith raw URL paths pushed active series from 12k → 890k in 6 h; AMP estimate +$2,400/mo. Relabel to template routes (/users/{id}) restored 14k series. See observability beyond CloudWatch for stack wiring.
Mechanism
Prometheus identifies a time series by metric name + label set. Each unique combination is billed storage and query cost. High-cardinality labels (IDs) multiply series combinatorially with other labels (status, method, pod).
AWS controls
| Approach | Service | Use when |
|---|---|---|
| Managed backend | AMP + AMG | EKS/ECS metrics at scale |
| Embedded metrics | CloudWatch EMF | Lambda/custom apps without scrape |
| SLO-native | Application Signals | Service golden signals—watch auto-discovered ops |
| Cost guard | Metric filters + alarms on IncomingLogEvents / AMP workspace limits | FinOps gate |
Opinionated take: Relabel at the collector (ADOT) before remote_write—do not fix cardinality in Grafana dashboards.
When this advice breaks
- Short-lived batch jobs — High churn series may be acceptable if retention is 24h and jobs are few.
- Debugging incidents — Temporary high-cardinality scrape OK with documented TTL and owner.
What to do this week
- Export top 20 labels by series count from AMP or Prometheus
label_valuessampling. - Add
drop/labelmapprocessors in ADOT config for forbidden labels (user_id,trace_id). - Set CloudWatch alarm on AMP
DiscardedSamplesor workspace ingestion rate spike. - Pair with log sampling guide (part 3 of this track).
What this guide doesn’t cover
Distributed tracing propagation—see part 1 OTel guide in this track.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.