Skip to main content

Monitoring & Observability

Datadog with AWS

Deep visibility into AWS infrastructure, Bedrock/SageMaker workloads, and applications — with a single tagging taxonomy across CloudWatch and Datadog.

Last updated:April 29, 2026Author:FactualMinds Cloud Integration TeamReviewed by:FactualMinds AWS-certified architects (Solutions Architect – Professional)

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Datadog on AWS in 2026: unified observability for CloudWatch, EKS, Lambda, Bedrock LLM workloads, and security posture across multi-cloud estates.

Key Facts

  • Datadog on AWS in 2026: unified observability for CloudWatch, EKS, Lambda, Bedrock LLM workloads, and security posture across multi-cloud estates
  • Deep visibility into AWS infrastructure, Bedrock/SageMaker workloads, and applications — with a single tagging taxonomy across CloudWatch and Datadog
  • How does Datadog integrate with AWS in 2026
  • The modern pattern uses AWS Integration via IAM role (no long-lived keys) plus CloudWatch Metric Streams over Amazon Data Firehose for near real-time metric ingest
  • On EC2 and EKS, install Datadog Agent v7 or the Datadog Operator; for Lambda use the Datadog Lambda Extension layer (no forwarder Lambda)

Entity Definitions

Amazon Bedrock
Amazon Bedrock is relevant to datadog with aws.
Bedrock
Bedrock is relevant to datadog with aws.
SageMaker
SageMaker is relevant to datadog with aws.
Lambda
Lambda is relevant to datadog with aws.
EC2
EC2 is relevant to datadog with aws.
S3
S3 is relevant to datadog with aws.
RDS
RDS is relevant to datadog with aws.
Aurora
Aurora is relevant to datadog with aws.
DynamoDB
DynamoDB is relevant to datadog with aws.
CloudWatch
CloudWatch is relevant to datadog with aws.
IAM
IAM is relevant to datadog with aws.
VPC
VPC is relevant to datadog with aws.
EKS
EKS is relevant to datadog with aws.
ECS
ECS is relevant to datadog with aws.
EventBridge
EventBridge is relevant to datadog with aws.
Ask AI: ChatGPT Claude Perplexity Gemini

Datadog + AWS overview

Datadog is an enterprise observability and security platform. On AWS, it ingests CloudWatch metrics through Amazon Data Firehose-backed Metric Streams, collects host and container telemetry via Agent v7 and the Datadog Operator on EKS, and captures serverless telemetry through the Datadog Lambda Extension — all tied together by Unified Service Tagging so the same env/service/version tag flows from metric to trace to log to LLM call.

FactualMinds deploys Datadog on AWS for teams that have either outgrown CloudWatch’s cross-service correlation or need consolidated visibility across AWS, on-prem, and a second cloud. We keep CloudWatch as the AWS-native source of truth for service quotas, AWS Health, and alarm-driven auto-recovery — Datadog becomes the investigative and SLO layer on top.

What’s new for Datadog on AWS in 2026

How Datadog monitors AWS (implementation patterns)

CloudWatch Metric Streams (preferred for AWS-service metrics)

Datadog Agent v7 + Datadog Operator on EKS

Datadog Lambda Extension (serverless)

AWS PrivateLink endpoints

Key Datadog + AWS features

Infrastructure monitoring

LLM Observability (GA in 2024)

Database Monitoring

Application Performance Monitoring (APM)

Log Management + Flex Logs

Cloud SIEM

Cost Management

Datadog pricing for AWS (2026)

Pricing evolves — verify at datadoghq.com/pricing. Current ballparks:

Infrastructure monitoring

APM + Continuous Profiler

Log Management

LLM Observability, Cloud SIEM, Database Monitoring

Typical totals: small teams $400–$1,500/month, mid-market $3k–$15k/month, enterprise on annual contracts with significant discount.

Datadog vs CloudWatch vs open-source

Datadog

CloudWatch + Application Signals + AWS Managed Grafana

Open-source (Prometheus + Grafana + OpenTelemetry + Loki)

When Datadog is NOT the right call

Implementation: multi-account onboarding via CloudFormation StackSet

Datadog publishes a CloudFormation template that creates the IAM role, event subscriptions, and CloudWatch Metric Streams per account. Deploy via StackSet across the AWS Organization:

# Excerpt — Datadog provides the canonical template via the AWS Integration page
aws cloudformation create-stack-set \
  --stack-set-name datadog-aws-integration \
  --template-url https://datadog-cloudformation-template.s3.amazonaws.com/aws/main.yaml \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameters \
      ParameterKey=DatadogApiKey,ParameterValue="<api-key-from-secrets-manager>" \
      ParameterKey=DatadogSite,ParameterValue=datadoghq.com \
      ParameterKey=ExternalId,ParameterValue="<datadog-supplied-external-id>" \
      ParameterKey=InstallDatadogPolicies,ParameterValue=true \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false

aws cloudformation create-stack-instances \
  --stack-set-name datadog-aws-integration \
  --deployment-targets OrganizationalUnitIds=ou-xxx-yyyy \
  --regions us-east-1

Always source the template URL and parameters from the Datadog Admin → Integrations → AWS page — Datadog publishes updated templates as the trust contract evolves.

Implementation: Datadog Operator with Pod Identity on EKS

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
  namespace: datadog
spec:
  global:
    clusterName: prod-eks-eu-west-1
    site: datadoghq.com
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
    tags:
      - 'team:platform'
      - 'env:prod'
      - 'cost-center:eng'
  features:
    apm:
      enabled: true
    logCollection:
      enabled: true
      containerCollectAll: true
    orchestratorExplorer:
      enabled: true
    liveContainerCollection:
      enabled: true
    eventCollection:
      collectKubernetesEvents: true
  override:
    nodeAgent:
      serviceAccountName: datadog-agent
      # Pod Identity association created out-of-band via:
      # aws eks create-pod-identity-association \
      #   --cluster-name prod-eks-eu-west-1 \
      #   --namespace datadog \
      #   --service-account datadog-agent \
      #   --role-arn arn:aws:iam::123:role/datadog-agent-pod-identity

Pod Identity replaces IRSA — no OIDC provider, no ServiceAccount annotation. Cluster Agent and Admission Controller are managed by the Operator.

Failure modes & resilience

1. CloudWatch Metric Streams Firehose backpressure. A spike in metric volume (sudden Lambda concurrency, EKS node-fleet replacement) can fill the Firehose buffer; Datadog ingest lags by minutes. Mitigation: monitor aws.firehose.delivery_to_http_endpoint.records_delivered_count against incoming records; raise Firehose buffer size and Datadog endpoint concurrency via the StackSet update.

2. Custom-metric cardinality runaway. A single new tag key with high cardinality (request_id, user_id, raw URL path) explodes metric counts and Datadog bills. Datadog enforces a per-organization custom-metrics limit. Mitigation: tag schema review at PR time; a periodic query on datadog.estimated_usage.custom_metrics filtered by metric_name; drop high-cardinality tags via the metrics-without-limits feature or convert to log-based metrics.

3. Exclusion-filter drift. Filters configured to drop noisy logs are easy to forget; cost climbs silently as new services emit similar logs without matching filters. Mitigation: quarterly review of top log-cost contributors; codify exclusion filters in the Datadog Terraform provider so changes go through PR review.

4. Lambda Extension cold-start. First invocation of a Lambda function with the Datadog Extension layer adds 100–300 ms to init for DD_API_KEY decryption (when sourced from Secrets Manager). Mitigation: use DD_API_KEY_SECRET_ARN only for environments that justify the cost; for latency-critical functions, set the API key as a Lambda env var with a Provisioned Concurrency configuration to amortize.

5. Agent host-reporting drift on Auto Mode. Auto Mode replaces nodes; transient reporting gaps (~30 s) appear during replacement. Mitigation: dashboards should query over windows ≥ 1 min; alarms with 2/3 datapoints to avoid replacement-induced false positives.

6. Datadog API rate limits. 300 reqs/hour for most public APIs, 600 for Logs Search. Bulk dashboard imports or programmatic monitor management can trip this. Mitigation: backoff with jitter; use the Terraform provider with parallelism capped.

7. Datadog itself is down. Region incidents happen. Mitigation: keep CloudWatch alarms on the truly load-bearing AWS-service metrics (RDS CPU, Lambda errors, ALB 5xx) so on-call gets paged even if Datadog is unavailable. Don’t centralize EVERY alert in Datadog.

Observability runbook (alerting on Datadog itself)

Meta-monitors we ship:

MonitorThresholdFirst action
datadog.agent.up per host (no-data alert)no data > 10 minConfirm node still exists; check Agent status / logs
Custom-metric count by service> 100k distinct timeseriesCardinality review; drop tags or convert to log-based metric
Log ingestion volume by service> 2× 7-day baselineSudden log explosion; identify and exclude or move to Flex Logs
Firehose delivery_to_http_endpoint.success ratio< 99% for 15 minDatadog endpoint health; AWS Firehose error logs
aws.integration.run_status by AWS accountfailureDatadog Admin → Integrations → AWS → check role assumption error
LLM Observability prompts failing evalspike > baselinePrompt regression; pair with Bedrock Guardrails findings
Custom-metric usage > 80% of contracted limitmonthlyRenegotiate or trim before hard cap

Debug path: “metric missing in Datadog”:

  1. Confirm the metric is being emitted: from the host, agent status → list of integrations and their last collection.
  2. Datadog Admin → Integrations → AWS → check that the relevant service is enabled (CloudWatch namespaces are opt-in).
  3. Inspect Metric Streams: AWS console → CloudWatch → Metric Streams → status running; recent errors in the destination Firehose.
  4. Tag filter mismatch: Datadog filters at ingest may drop the metric — review include/exclude rules.
  5. Custom metric: confirm the host/container has DogStatsD enabled and the metric name is not collapsing to a quota-limited family.

Best practices

Tagging

Alerts

Cost control

Security review

700+
AWS & SaaS integrations in Datadog
4
Telemetry types unified (metrics, logs, traces, LLM)
30-60%
Typical log-bill reduction after a Flex Logs + exclusion-filter pass

Tools & Calculators

Self-serve calculators and assessments that pair with this integration.

AWS CloudWatch Cost Calculator

Baseline your CloudWatch + Datadog spend before you consolidate dashboards.

Related AWS Services

Consulting engagements that frequently pair with this integration.

AWS Well-Architected Review — Free Assessment

Free AWS Well-Architected Review from FactualMinds. Identify risks, compliance gaps, and optimization opportunities.

AWS Cost Optimization & FinOps Consulting

AWS cost optimization and FinOps consulting from FactualMinds — reduce spend by 20-40% with expert right-sizing and strategy.

AWS DevOps Consulting

AWS DevOps consulting — CI/CD pipeline setup, infrastructure as code (SAM/CDK), and deployment automation.

Who typically runs this integration?

The roles that most often own or review this stack.

AWS Solutions for DevOps & Platform Engineers

EKS Auto Mode, OIDC-native CI/CD, supply-chain security, CDK Toolkit v2, and eBPF observability for platform teams building the platform on AWS in 2026.

AWS Solutions for FinOps Teams

FinOps Framework 2025 rollout, AI unit economics, CUR 2.0 with Split Cost Allocation, and Bedrock cost controls for cloud finance leaders on AWS.

Related Integrations

Other AWS integration guides commonly deployed alongside this one.

Kubernetes on AWS (EKS)

Amazon EKS in 2026: Auto Mode GA, Hybrid Nodes, Karpenter 1.0, Pod Identity, Graviton-first node pools, and ECR enhanced scanning — cheaper, safer K8s.

GitHub Actions with AWS

GitHub Actions to AWS in 2026: OIDC keyless auth, Artifact Attestations, Immutable Actions, ARM runners, and reusable workflows to ECS, Lambda, EKS.

Frequently Asked Questions

How does Datadog integrate with AWS in 2026?
The modern pattern uses AWS Integration via IAM role (no long-lived keys) plus CloudWatch Metric Streams over Amazon Data Firehose for near real-time metric ingest. On EC2 and EKS, install Datadog Agent v7 or the Datadog Operator; for Lambda use the Datadog Lambda Extension layer (no forwarder Lambda). For multi-account estates, onboard accounts through the AWS Integration page using the CloudFormation StackSet Datadog publishes — it creates the IAM role and event subscriptions consistently across Organizations.
What AWS metrics and logs does Datadog collect?
Out of the box: EC2 CPU/memory/disk, EBS IOPS, RDS performance insights, S3 object counts and sizes, Lambda duration and cold starts, ELB/ALB latency, DynamoDB throughput, EventBridge rule failures, and GuardDuty/Security Hub findings. Logs can arrive via the CloudWatch Logs subscription filter, S3 archive ingestion, or the Agent. Custom metrics land via DogStatsD, OTLP, or the API — and the same tags flow to metrics, traces, and logs when you use Unified Service Tagging.
Can Datadog replace CloudWatch entirely?
Usually no — and trying to is where most teams overspend. CloudWatch is billed per metric/log whether you look at it or not, but a handful of AWS features only emit to CloudWatch natively (most CloudWatch Alarms, AWS Health events, Lambda CloudWatch metrics used by service quotas). The pragmatic pattern: keep CloudWatch for AWS-service-native alarms and quota dashboards; use Datadog as the single pane of glass for application traces, custom metrics, LLM observability, and cross-account correlation. Datadog ingests CloudWatch via Metric Streams, so you still see everything in one place.
How do I correlate logs, metrics, and traces in Datadog?
Use Unified Service Tagging: `env`, `service`, and `version` tags on every piece of telemetry, propagated by the Agent/Tracer. For AWS resources, Datadog inherits tags from CloudWatch and Resource Groups so existing `cost-center`/`team` tags show up automatically. Enable Data Streams Monitoring for Kafka/Kinesis/SNS/SQS to get end-to-end flow tracking. Trace Explorer and Live Tail support identical query syntax across telemetry types.
What does Datadog cost on a typical AWS estate in 2026?
Pricing changes — always confirm at datadoghq.com/pricing. Current ballparks: Infrastructure Pro starts ~$15/host/month, APM adds ~$31/host/month, Log Management is priced per GB ingested + retained (Flex Logs is the cheaper tier for audit/compliance logs queried infrequently), LLM Observability and Cloud SIEM are sold separately. For mid-market AWS estates we typically see $3k–$15k/month, with the biggest optimization levers being exclusion filters, Flex Logs for archive-pattern logs, log-based metrics, and dropping high-cardinality custom metrics.
Datadog LLM Observability on Bedrock vs CloudWatch GenAI observability — which do I use?
Both, for different questions. CloudWatch GenAI observability (CloudWatch Application Signals + Bedrock invocation metrics) is free-tier for the basics — token usage, invocation latency, and error rates across Bedrock models — and is sufficient if you only need operational alerts. Datadog LLM Observability (GA in 2024, matured through 2025) adds prompt/completion capture, hallucination scoring, quality evaluators, and correlation with APM traces, so a slow checkout trace can be tied to a specific Bedrock Claude call. Teams running Bedrock AgentCore or multi-step agents almost always need the prompt/trace view, which CloudWatch does not provide.
How do we audit the Datadog-to-AWS trust relationship for security review?
Three checks: (1) the Datadog IAM role must require the external ID Datadog assigns to your account — prevents confused-deputy; (2) the role should be scoped with the Datadog-published managed policy, no `*`-on-`*`; (3) for regulated workloads, enable Datadog AWS PrivateLink endpoints so metric/log traffic never transits the public internet. Pair with Datadog Cloud SIEM for detection on the AWS CloudTrail feed if you want Datadog to alert on the IAM role itself being modified.

Related Reading

Need Help with This Integration?

Our AWS-certified engineers can design, implement, and operate this integration end-to-end — or review what you already have.