# Cutting Amazon Managed Service for Prometheus (AMP) cost — the three levers that actually move the bill

AWS documents that **metric ingestion is the largest cost driver** for most AMP
customers. These three levers, in priority order, are what move the bill. The
fourth thing teams reach for — reducing retention — is explicitly called out by
AWS as unlikely to help much, because storage is a small slice of the cost.

> Confirm current per-sample ingestion/query/storage rates and free-tier limits
> on the Amazon Managed Service for Prometheus pricing page. Free tier (mid-2026
> model): 40M samples ingested, 200B query samples processed, 10 GB stored.

## Lever 1 — Increase the scrape interval (biggest, easiest win)

Samples ingested scale linearly with scrape frequency. Halving frequency halves
ingestion cost for those series.

```yaml
# Default many teams ship with:
scrape_interval: 15s
# For infra/host metrics that don't need second-level resolution:
scrape_interval: 60s   # 4x fewer samples for the same series
```

Keep 15s only for the handful of series that drive fast alerts (error rate,
saturation). Everything else can usually live at 30–60s.

## Lever 2 — Filter metrics at the source (drop before ingest)

Most exporters emit far more series than you alert or dashboard on. Drop them in
the scrape config (or in the ADOT collector — see `adot-collector-config.yaml`)
**before** they are ingested:

```yaml
metric_relabel_configs:
  # Keep only the metric families you actually use
  - source_labels: [__name__]
    regex: "go_gc_.*|process_.*|promhttp_.*"
    action: drop
  # Drop a high-cardinality label that explodes series count
  - regex: "pod_template_hash|controller_revision_hash"
    action: labeldrop
```

High-cardinality **labels** (user IDs, request IDs, full URLs) are the silent
killer: one bad label can multiply a series into millions. Audit cardinality
before it audits your invoice.

## Lever 3 — Pre-aggregate with recording rules

If a dashboard or alert always queries an aggregation, compute it once with a
recording rule instead of scanning raw series on every query (this also cuts
**query** sample cost):

```yaml
groups:
  - name: http-slo-aggregates
    interval: 60s
    rules:
      - record: job:http_request_duration_seconds:p99_5m
        expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le))
      - record: job:http_requests:error_rate_5m
        expr: sum(rate(http_requests_total{code=~"5.."}[5m])) by (job) / sum(rate(http_requests_total[5m])) by (job)
```

## What does NOT help much

- **Cutting retention.** AWS notes storage is a minor cost component; dropping
  retention from 150 days rarely moves the bill meaningfully. Default retention
  is 150 days, configurable up to 3 years — set it to your compliance need, not
  as a cost lever.
- **Turning off alerting to "save queries."** Native AMP alerting is cheaper
  than scanning raw series from an external system on a tight interval; tune the
  alert lookback window instead of removing alerts.

## Order of operations

1. Audit series cardinality (find the exploding labels).
2. Apply Lever 2 (drop unused families + high-cardinality labels).
3. Apply Lever 1 (raise scrape interval on non-alerting series).
4. Apply Lever 3 (recording rules for repeated aggregations).
5. Only then look at retention — and usually leave it alone.
