---
title: Observability Beyond CloudWatch (2026): When to Add Application Signals, ADOT, Managed Prometheus, and Grafana — and When Not To
description: The reflex to bolt Amazon Managed Prometheus + Grafana onto every workload is how observability bills quietly double. CloudWatch Application Signals now gives you an auto-discovered service map, SLOs, and traces with near-zero setup; AMP only earns its keep when you are PromQL-native or drowning in high-cardinality metrics — where ingestion (not retention) is the cost driver. Here is the decision matrix, an ADOT dual-export config, and the three levers that actually cut the AMP bill.
url: https://www.factualminds.com/blog/aws-observability-beyond-cloudwatch-otel-prometheus-grafana-2026/
datePublished: 2026-06-09T00:00:00.000Z
dateModified: 2026-06-10T00:00:00.000Z
author: Palaniappan P
category: DevOps & CI/CD
tags: aws, observability, opentelemetry, devops, cost-optimization
---

# Observability Beyond CloudWatch (2026): When to Add Application Signals, ADOT, Managed Prometheus, and Grafana — and When Not To

> The reflex to bolt Amazon Managed Prometheus + Grafana onto every workload is how observability bills quietly double. CloudWatch Application Signals now gives you an auto-discovered service map, SLOs, and traces with near-zero setup; AMP only earns its keep when you are PromQL-native or drowning in high-cardinality metrics — where ingestion (not retention) is the cost driver. Here is the decision matrix, an ADOT dual-export config, and the three levers that actually cut the AMP bill.

**The fastest way to double an AWS observability bill in 2026 is to bolt Amazon Managed Prometheus and Grafana onto a workload that CloudWatch Application Signals would have covered.** As of mid-2026, CloudWatch Application Signals gives you an auto-discovered service map, SLOs, and correlated traces with near-zero instrumentation — features that landed and matured across 2024–2025 (dependency SLOs in April 2025, multi-account views via OAM in February 2025, EKS auto-monitor in the CloudWatch Observability add-on v4.0.0 in May 2025). Yet the reflex is still to stand up a second metrics backend "for real observability." Sometimes that's right. Often it's a second billing surface and a second query language for no measured gain. This post is the decision framework, not a tutorial on any one tool.

This is for platform and SRE teams, and the engineering leaders signing the observability invoice. We ship a [tier decision matrix](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/observability-tier-decision-matrix.md), an [ADOT dual-export collector config](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/adot-collector-config.yaml), the [three AMP cost-control levers](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/amp-cost-control.md), and an [AMG/AMP cost model CSV](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/observability-cost-model.csv).

> **Benchmark pattern (not a cited client)** — A composite Kubernetes-heavy platform: ~40 microservices on EKS, an existing Prometheus + Grafana habit, and a "scrape everything at 15s" default. Modeled in the [cost CSV](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/observability-cost-model.csv): moving non-alerting infra series from 15s to 60s cuts those samples ~60%, and adding source-side metric filtering takes the relative ingestion index from 100 → ~22 — roughly a 4–5x reduction in the dominant AMP cost driver, with no loss of alerting fidelity. Separately, switching 8 dashboard-only engineers from Grafana Editor ($9) to Viewer ($5) trims AMG user cost by ~44% on that slice. Neither change touches retention, because storage isn't where the money is.

## Tier 1: CloudWatch core is the floor, not a placeholder

If you need logs, metrics, alarms, and Logs Insights over AWS services, CloudWatch core is the answer — don't add a second stack for it. The cost traps here are well-trodden: high-cardinality **custom metrics** and verbose log ingestion. Those are real, but they are CloudWatch _hygiene_ problems, not reasons to migrate to Prometheus. (We cover that hygiene in depth in [observability FinOps and cardinality cost control](/blog/aws-observability-finops-cardinality-cost-control/) and [CloudWatch logging costs](/blog/aws-cloudwatch-logging-costs-observability/).)

## Tier 2: Application Signals is the APM you probably already have

The moment you want **APM** — a service map, SLOs, "which dependency is breaking my latency" — the default should be **CloudWatch Application Signals**, not a new backend. It auto-discovers services and dependencies, draws the application map, tracks period- and request-based SLOs (including SLOs _on dependencies_ since April 2025), and correlates traces so you can drill from a fault-rate summary to the offending span.

It auto-instruments across EKS, EC2, ECS, Kubernetes, Lambda, and on-prem, and ingests OpenTelemetry via ADOT and the CloudWatch agent. **Setup gotcha:** you must **enable Transaction Search** to unlock the full APM feature set under the unified Application Signals pricing that bundles X-Ray traces and transaction spans. On EKS, the CloudWatch Observability add-on (v4.0.0+) can auto-monitor workloads behind a single config flag.

**Opinionated take:** most teams reaching for "we need APM, let's deploy Grafana Tempo + a tracing backend" should enable Application Signals first and measure whether the gap is real. It usually isn't.

## Tier 3: ADOT + Managed Prometheus + Grafana — earn it

Step up to **ADOT + Amazon Managed Service for Prometheus (AMP) + Amazon Managed Grafana (AMG)** when you can name the reason:

- You're **PromQL/Prometheus-native** with existing exporters and dashboards.
- You have **high-cardinality** metrics where Prometheus's model beats CloudWatch custom metrics on both ergonomics and cost.
- You want **OpenTelemetry-native, vendor-portable** instrumentation.
- You need a **single Grafana pane** correlating AWS and third-party data.

AMP is serverless and Prometheus-compatible (PromQL, Multi-AZ, EKS + self-managed K8s), with default 150-day retention configurable up to 3 years. AMG is fully managed Grafana over CloudWatch, X-Ray, Prometheus, and third-party sources.

The pragmatic shape is **instrument once with OpenTelemetry and dual-export** — traces to X-Ray (feeding Application Signals' service map and SLOs) and metrics to AMP (for PromQL + high cardinality). That's exactly what the [ADOT collector config](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/adot-collector-config.yaml) does. The cost: one more component — the collector — to run and keep upgraded.

> **What broke** — A team adopted AMP + AMG on day one for a new EKS platform "to do observability properly," scraping every exporter at 15s and granting all 12 engineers Grafana Editor. The first month's bill was dominated by AMP **ingestion** (the scrape-everything default) and inflated by paying $9/Editor for engineers who only ever viewed dashboards. Nothing was _wrong_ — it just cost multiples of what it needed to. The fix was unglamorous: raise scrape intervals on non-alerting series, drop unused metric families at the source, and reassign 8 users to the $5 Viewer tier. The mistake wasn't the tools; it was adopting them before measuring whether CloudWatch + Application Signals already answered the questions, then running them with no cost discipline.

## The cost lever that surprises people: ingestion, not retention

AWS is explicit that **metric ingestion is the largest AMP cost driver**, and that **cutting retention rarely helps**. The [three levers](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/amp-cost-control.md), in order:

1. **Raise the scrape interval** on series that don't need 15s resolution (60s is ~4x fewer samples for those series).
2. **Filter metric families and high-cardinality labels at the source** — one runaway label (user ID, request ID, full URL) can multiply a series into millions.
3. **Pre-aggregate with recording rules** — compute the p99/error-rate once instead of scanning raw series on every dashboard load (also cuts query-sample cost).

Leave retention alone unless compliance demands a change. For AMG, default users to **Viewer ($5)** and reserve **Editor ($9)** for dashboard authors.

## What to do this week

1. Inventory which workloads have **Application Signals** on. Enable it (with **Transaction Search**) on your top revenue services before you consider any new backend.
2. For each AMP/AMG workload, ask: _what specific question does CloudWatch + Application Signals fail to answer?_ If you can't name it, you've found a candidate to retire.
3. Run the [tier decision matrix](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/observability-tier-decision-matrix.md) per workload — don't apply one stack uniformly.
4. If you run AMP: apply the [cost-control levers](https://bitbucket.org/baymail/factualminds-astro/src/main/examples/architecture-blog-2026/observability-stack/amp-cost-control.md) — audit cardinality, raise scrape intervals, filter at source.
5. Audit Grafana licenses: reassign view-only engineers from Editor to Viewer.

## What this post doesn't cover

- **CloudWatch alarm and Logs Insights fundamentals** — see [CloudWatch metrics, logs, and alarms best practices](/blog/aws-cloudwatch-observability-metrics-logs-alarms-best-practices/).
- **Distributed-systems debugging workflow** (how to actually use traces in an incident) — see [debugging production distributed AWS systems](/blog/debug-production-distributed-aws-systems/).
- **A hands-on OpenTelemetry + chaos tutorial** — see [the OTel demo game post](/blog/otel-demo-game-aws-observability-chaos-engineering/).
- **Loki/log-analytics backends and Grafana OnCall** — out of scope here.
- **Exact current pricing** — confirm AMP per-sample rates and AMG per-user rates on the respective AWS pricing pages; figures here are the mid-2026 model.

---

**Related:** [CloudWatch observability best practices](/blog/aws-cloudwatch-observability-metrics-logs-alarms-best-practices/) · [Observability FinOps & cardinality cost control](/blog/aws-observability-finops-cardinality-cost-control/) · [CloudWatch logging costs](/blog/aws-cloudwatch-logging-costs-observability/) · [Debug production distributed systems](/blog/debug-production-distributed-aws-systems/) · [AWS managed services](/services/aws-managed-services/)

**If you only do one thing:** Before standing up any new metrics backend, enable CloudWatch Application Signals with Transaction Search on your top services and ask what question it _fails_ to answer. If you can't name the gap, you don't need the second stack — and you've just avoided doubling the bill.

## Related reading

- [The AWS CLI Bug That Broke /dev/null Across Your Entire System](/blog/aws-cli-chmod-dev-null-streaming-bug-2026/)
- [AWS Environment Parity: Why Dev/Staging/Prod Drift Costs More Than It Saves](/blog/aws-environment-parity-dev-staging-production/)
- [What DevOps Guides Don](/blog/devops-exercises-aws-production-reality/)
- [DevOps on AWS: CodePipeline vs GitHub Actions vs Jenkins](/blog/devops-on-aws-codepipeline-vs-github-actions-vs-jenkins/)
- [Two Free LocalStack Alternatives in 2026: MiniStack vs floci](/blog/ministack-free-localstack-alternative-aws-emulator/)
- [The Terraform Command Cheat Sheet for AWS Engineers (2026 Edition)](/blog/terraform-commands-cheat-sheet-aws-2026/)
- [How to Build Ultra-Fast Asset Pipelines with Bun, Vite, and Rust-Based Tooling (2026)](/blog/ultra-fast-asset-pipelines-bun-vite-rust/)

## FAQ

### Do we need Amazon Managed Prometheus if we already use CloudWatch?
Usually not, unless you are already Prometheus/PromQL-native or have high-cardinality metrics that get expensive as CloudWatch custom metrics. For most teams, CloudWatch core (metrics, logs, alarms, Logs Insights) plus CloudWatch Application Signals for APM covers application and AWS-service observability without a second backend to operate. Reach for Amazon Managed Service for Prometheus (AMP) when you have existing Prometheus exporters and Grafana dashboards, a heavy Kubernetes estate, or label-rich metrics where Prometheus cardinality handling and PromQL are genuinely better tools. Adopting AMP "to be complete" adds a second billing surface and a second query language for no measured benefit — adopt it because you hit a specific gap, not because the diagram looked unfinished.

### What is CloudWatch Application Signals and what setup does it need?
Application Signals is the APM layer inside CloudWatch: it auto-discovers your services and dependencies, draws an application map, tracks Service Level Objectives (SLOs), and correlates traces — so you can go from a high-level fault/latency summary down to the offending span. It auto-instruments applications running on Amazon EKS, EC2, ECS, Kubernetes, Lambda, or on-premise, and it can ingest OpenTelemetry telemetry via AWS Distro for OpenTelemetry (ADOT) and the CloudWatch agent. One important setup note: you must enable Transaction Search to unlock the full APM feature set under the unified Application Signals pricing that includes X-Ray traces and transaction spans. For EKS specifically, recent versions of the CloudWatch Observability add-on can auto-monitor workloads with a single configuration flag.

### When should we NOT use Amazon Managed Grafana?
Skip Amazon Managed Grafana (AMG) when CloudWatch dashboards already answer your questions and your data lives entirely in AWS — you do not need a separate visualization layer and its per-active-user billing. AMG earns its place when you need to correlate AWS telemetry with third-party sources in one pane, when your team already lives in Grafana, or when you want PromQL/Loki-style dashboards over AMP. Watch the pricing model: AMG bills per active user per workspace per month (Editor/Admin at $9, Viewer at $5 in the mid-2026 model), with optional Enterprise plugins adding $45 per active user. The common over-spend is making every engineer an Editor when most only view dashboards — assign Viewer by default and Editor only to people who build dashboards.

### What actually drives Amazon Managed Service for Prometheus cost, and how do we cut it?
AWS documents that metric ingestion is the largest cost driver for most AMP customers — not storage. The three levers that move the bill, in order: (1) increase the scrape interval for series that do not need second-level resolution (moving infra metrics from 15s to 60s is roughly 4x fewer samples for those series); (2) filter unused metric families and high-cardinality labels at the source before they are ingested; and (3) pre-aggregate repeated queries with recording rules, which also cuts query-sample cost. Reducing the retention period is explicitly called out by AWS as unlikely to help much, because storage is a minor component — default retention is 150 days, configurable up to 3 years, so set it to your compliance need rather than treating it as a cost lever.

### Should we instrument with OpenTelemetry or use CloudWatch agent auto-instrumentation?
Use OpenTelemetry (via ADOT) when you want vendor-portable instrumentation — the same SDK and collector can export to CloudWatch/X-Ray today and to a different backend later without re-instrumenting your code, which matters if you run hybrid or multi-cloud or want to avoid lock-in. Use CloudWatch agent auto-instrumentation / Application Signals auto-monitor when you want the fastest path to a service map and SLOs with the least configuration and you are committed to staying on CloudWatch. A common, pragmatic pattern is to instrument once with OpenTelemetry and dual-export: traces to X-Ray (feeding Application Signals) and metrics to Amazon Managed Service for Prometheus — see the ADOT collector config linked in this post. The trade-off is one more component (the collector) to run and keep upgraded.

### Is a three-tool stack (Application Signals + AMP + AMG) a reasonable target?
It can be, but only when you have measured the need for each piece. The valid version is: Application Signals for application-level SLOs and traces, AMP for high-cardinality infrastructure metrics from a Prometheus-heavy Kubernetes estate, and AMG as the unified dashboard spanning AWS and third-party data. The cost of that completeness is two billing surfaces and two query languages (CloudWatch metrics + PromQL), plus the operational overhead of a collector fleet. The anti-pattern is adopting all three on day one because the reference architecture showed them. Start with CloudWatch + Application Signals, prove a specific gap (PromQL-native team, cardinality cost, third-party correlation), and add the next tier only to close that gap.

---

*Source: https://www.factualminds.com/blog/aws-observability-beyond-cloudwatch-otel-prometheus-grafana-2026/*
