AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

A practical guide to AWS Route 53 — hosted zones, routing policies, health checks, DNS failover, domain registration, and the traffic management patterns that make applications highly available.

Key Facts

  • A practical guide to AWS Route 53 — hosted zones, routing policies, health checks, DNS failover, domain registration, and the traffic management patterns that make applications highly available
  • A practical guide to AWS Route 53 — hosted zones, routing policies, health checks, DNS failover, domain registration, and the traffic management patterns that make applications highly available

Entity Definitions

Route 53
Route 53 is an AWS service discussed in this article.

AWS Route 53: DNS and Traffic Management Patterns

Cloud Architecture 9 min read

Quick summary: A practical guide to AWS Route 53 — hosted zones, routing policies, health checks, DNS failover, domain registration, and the traffic management patterns that make applications highly available.

Key Takeaways

  • A practical guide to AWS Route 53 — hosted zones, routing policies, health checks, DNS failover, domain registration, and the traffic management patterns that make applications highly available
  • A practical guide to AWS Route 53 — hosted zones, routing policies, health checks, DNS failover, domain registration, and the traffic management patterns that make applications highly available
AWS Route 53: DNS and Traffic Management Patterns
Table of Contents

Route 53 is AWS’s DNS service — it translates domain names into IP addresses that computers use to connect. But Route 53 is more than a DNS host. It provides health checking, traffic routing, failover, and geolocation-based routing that make it a critical component of high-availability architectures.

DNS is also one of the services where mistakes are most visible and most disruptive. A misconfigured DNS record takes down your entire application — not one server, not one Region, everything. This guide covers the Route 53 patterns that keep production applications available and performant.

Core Concepts

Hosted Zones

A hosted zone is a container for DNS records for a domain:

Public hosted zone — Resolves domain names on the internet:

example.com (public hosted zone)
  ├── A     example.com          → ALB (dualstack.my-alb.us-east-1.elb.amazonaws.com)
  ├── CNAME www.example.com      → example.com
  ├── MX    example.com          → 10 mail.example.com
  └── TXT   example.com          → "v=spf1 include:amazonses.com ~all"

Private hosted zone — Resolves domain names within a VPC only:

internal.example.com (private hosted zone, attached to VPC)
  ├── A     api.internal.example.com      → 10.0.11.50
  ├── A     db.internal.example.com       → 10.0.21.30
  └── CNAME cache.internal.example.com    → my-redis.abc123.use1.cache.amazonaws.com

Private hosted zones enable human-readable names for internal services without exposing them to the internet. Attach the hosted zone to each VPC that needs to resolve these names.

Record Types

TypePurposeExample
AMaps domain to IPv4 addressexample.com → 192.0.2.1
AAAAMaps domain to IPv6 addressexample.com → 2001:db8::1
CNAMEMaps domain to another domainwww.example.com → example.com
AliasMaps domain to AWS resource (Route 53 specific)example.com → ALB DNS name
MXMail server routingexample.com → 10 mail.example.com
TXTText records (SPF, DKIM, verification)"v=spf1 include:amazonses.com ~all"
NSName server delegationexample.com → ns-123.awsdns-45.com
SRVService location_sip._tcp.example.com → 10 60 5060 sip.example.com

Alias Records

Alias records are Route 53’s most important feature — they map your domain directly to AWS resources without CNAME limitations:

CNAME limitation: Cannot be used at the zone apex (example.com)
Alias advantage: Works at the zone apex AND is free (no query charges)

Use Alias records for:

  • ALB/NLB/CLB endpoints
  • CloudFront distributions
  • S3 website endpoints
  • API Gateway endpoints
  • Another Route 53 record in the same hosted zone

Alias queries are free. CNAME queries are charged. Always use Alias when pointing to an AWS resource.

Routing Policies

Routing policies determine how Route 53 responds to DNS queries. Each policy serves a different traffic management pattern.

Simple Routing

One record, one or more values. Route 53 returns all values in random order:

example.com → [192.0.2.1, 192.0.2.2, 192.0.2.3]
Client receives all IPs, connects to one (typically the first)

Best for: Single resources or simple round-robin distribution. No health checks — if one IP is unhealthy, clients may still receive it.

Weighted Routing

Distribute traffic by percentage across multiple resources:

example.com:
  Record 1: ALB-primary   weight=90  (90% of traffic)
  Record 2: ALB-canary    weight=10  (10% of traffic)

Use cases:

  • Canary deployments — Send 5-10% of traffic to the new version, monitor, then shift 100%
  • A/B testing — Split traffic between application variants
  • Blue-green migrations — Gradually shift traffic from old to new infrastructure
  • Regional load distribution — Send traffic to Regions proportionally to capacity

Latency-Based Routing

Route traffic to the Region with the lowest latency for the user:

example.com:
  Record 1: ALB-us-east-1  (Region: us-east-1)
  Record 2: ALB-eu-west-1  (Region: eu-west-1)
  Record 3: ALB-ap-southeast-1 (Region: ap-southeast-1)

A user in London gets routed to eu-west-1. A user in Tokyo gets routed to ap-southeast-1. Route 53 measures latency between the user’s DNS resolver location and each AWS Region.

Best for: Multi-Region applications where user experience depends on latency. Combine with health checks for automatic failover when a Region is unhealthy.

Geolocation Routing

Route traffic based on the geographic location of the user:

example.com:
  Record 1: ALB-eu-west-1     (Location: Europe)
  Record 2: ALB-us-east-1     (Location: North America)
  Record 3: ALB-ap-northeast-1 (Location: Asia)
  Record 4: ALB-us-east-1     (Location: Default — catches all others)

Use cases:

  • Compliance — Keep EU user data in EU Regions (GDPR)
  • Content localization — Serve language-specific content by geography
  • License restrictions — Restrict service availability by country
  • Regulatory requirements — Different processing rules per jurisdiction

Always include a default record. Without it, users outside your defined geographies receive no response.

Failover Routing

Active-passive failover between primary and secondary resources:

example.com:
  Primary:   ALB-us-east-1  (health check: /health → 200 OK)
  Secondary: ALB-us-west-2  (used only when primary fails health check)

Route 53 monitors the primary with health checks. When the primary fails, Route 53 automatically routes all traffic to the secondary. When the primary recovers, traffic routes back.

Best for: Disaster recovery with active-passive architecture. The secondary can be a full standby, a static S3 error page, or a different Region deployment.

Multivalue Answer Routing

Return multiple healthy values with health checks:

example.com:
  Record 1: 192.0.2.1  (health check: healthy)    ← returned
  Record 2: 192.0.2.2  (health check: healthy)    ← returned
  Record 3: 192.0.2.3  (health check: unhealthy)  ← NOT returned
  Record 4: 192.0.2.4  (health check: healthy)    ← returned

Similar to simple routing but with health checks — unhealthy resources are removed from responses. Returns up to 8 healthy records.

Best for: Simple health-checked round-robin without the complexity of an ALB.

Health Checks

Health checks are the mechanism that makes routing policies intelligent — without them, Route 53 routes to dead endpoints.

Health Check Types

TypeMonitorsBest For
EndpointHTTP/HTTPS/TCP responseALBs, APIs, web applications
CalculatedCombination of other health checksComplex health logic (3 of 5 healthy)
CloudWatch alarmCloudWatch metric alarm stateCustom health based on any metric

Endpoint Health Checks

Health check configuration:
  Protocol: HTTPS
  Endpoint: example.com/health
  Port: 443
  Path: /health
  Interval: 30 seconds (or 10 seconds for fast detection)
  Failure threshold: 3 (mark unhealthy after 3 consecutive failures)
  Regions: Route 53 checks from multiple Regions worldwide

Health check endpoint design:

  • Return 200 OK only when the application is fully operational
  • Check database connectivity, cache availability, and critical dependencies
  • Respond within 4 seconds (Route 53’s timeout for standard health checks)
  • Do not cache health check responses

Calculated Health Checks

Combine multiple health checks with AND/OR logic:

Application health = API health check AND Database health check AND Cache health check
  → All three must be healthy for the application to be considered healthy
  → If any one fails, failover routing activates

OR variant:
Application health = API-AZ-a OR API-AZ-b OR API-AZ-c
  → Application is healthy if at least 1 of 3 AZs is healthy

Health Check + Alarm Integration

Use CloudWatch alarms for health metrics that are not HTTP-based:

CloudWatch alarm: SQS queue depth > 10,000 (consumer falling behind)
  → Alarm state: ALARM
    → Health check: Unhealthy
      → Route 53: Failover to secondary Region

This enables failover based on any CloudWatch metric — not just HTTP response codes.

Architecture Patterns

Multi-Region Active-Active

Serve traffic from multiple Regions simultaneously:

example.com (latency-based routing)
  ├── us-east-1: ALB → ECS Service (health check: /health)
  ├── eu-west-1: ALB → ECS Service (health check: /health)
  └── ap-southeast-1: ALB → ECS Service (health check: /health)

Users are routed to the nearest healthy Region. If a Region fails its health check, traffic automatically redistributes to the remaining Regions.

Requirements:

  • Stateless application tier (no local session state)
  • Global database (DynamoDB Global Tables or Aurora Global Database)
  • Cross-Region data replication

Multi-Region Active-Passive (DR)

Primary Region handles all traffic; secondary activates during failure:

example.com (failover routing)
  ├── Primary: us-east-1 ALB (health check: /health)
  └── Secondary: us-west-2 ALB (activated on primary failure)

Cost advantage: The secondary Region can run at reduced capacity (pilot light or warm standby) until failover, significantly reducing DR costs compared to active-active.

Blue-Green Deployment

Use weighted routing for zero-downtime deployments:

Step 1: example.com → Blue (weight=100), Green (weight=0)
Step 2: Deploy new version to Green
Step 3: example.com → Blue (weight=90), Green (weight=10)   ← canary
Step 4: Monitor Green for errors
Step 5: example.com → Blue (weight=0), Green (weight=100)   ← cutover

If Green shows errors at Step 4, revert: set Green weight back to 0. DNS propagation delay (TTL) means some traffic continues to the old weights for the TTL duration.

TTL for blue-green: Set record TTL to 60 seconds during deployments. Lower TTLs mean faster cutover but more DNS queries (and cost).

CloudFront + Route 53

For global content delivery:

example.com (Alias → CloudFront distribution)
  → CloudFront edge locations worldwide
    → Origin: ALB in us-east-1

CloudFront handles geographic distribution at the CDN layer. Route 53 provides the DNS resolution. Combined with WAF for security and S3 for static assets, this is the standard architecture for global web applications.

Domain Management

Domain Registration

Route 53 is also a domain registrar. Registering domains through Route 53 simplifies management — DNS hosted zone is created automatically.

Domain transfer to Route 53:

  1. Unlock the domain at the current registrar
  2. Get the authorization code (EPP code)
  3. Initiate transfer in Route 53 console
  4. Confirm transfer via email
  5. Transfer completes in 5-7 days

DNSSEC

DNSSEC (DNS Security Extensions) protects against DNS spoofing by digitally signing DNS records:

Route 53 DNSSEC:
  1. Enable DNSSEC signing in the hosted zone
  2. Route 53 creates a KMS key for signing
  3. Add DS record to the parent zone (registrar)
  4. Route 53 signs all records automatically

Enable DNSSEC for: Financial applications, healthcare, any domain where DNS spoofing could redirect users to malicious sites.

Cost Optimization

Pricing

ComponentCost
Hosted zone$0.50/month
Standard queries$0.40/million
Latency-based queries$0.60/million
Geo queries$0.70/million
Alias queries (to AWS resources)Free
Health checks (basic)$0.50/month
Health checks (with string matching)$0.75/month
Health checks (fast interval, 10 sec)$1.00/month

Cost Reduction

  • Use Alias records for AWS resources — Alias queries are free, CNAME queries are not
  • Increase TTL for stable records — Higher TTL = fewer queries = lower cost. Production records: 300 seconds (5 min). Stable records: 3600 seconds (1 hour)
  • Consolidate hosted zones — Multiple domains pointing to the same infrastructure can share a hosted zone using subdomains
  • Use basic health checks — Fast interval (10 sec) health checks cost 2x. Use standard interval (30 sec) unless your failover SLA requires faster detection

Monitoring

Set CloudWatch alarms for DNS health:

MetricAlarm ConditionIndicates
HealthCheckStatus0 (unhealthy)Endpoint failure, potential failover
HealthCheckPercentageHealthy< 100%Partial failure across health check Regions
DNSQueriesSudden spike or dropTraffic anomaly or DNS issue
ConnectionTime> 4 secondsEndpoint responding slowly

Query Logging

Enable Route 53 query logging to CloudWatch Logs for:

  • Traffic analysis — Understand DNS query patterns and volume
  • Security monitoring — Detect unusual query patterns or domains
  • Debugging — Verify which records are being returned to clients
  • Compliance — Audit trail of DNS resolution

Common Mistakes

Mistake 1: No Health Checks on Failover Records

Failover routing without health checks does not fail over — Route 53 always returns the primary record because it does not know the primary is unhealthy. Every failover and latency-based record must have an associated health check.

Mistake 2: TTL Too High During Deployments

Default TTL of 3600 seconds (1 hour) means DNS changes take up to 1 hour to propagate. Before a deployment or migration, reduce TTL to 60 seconds 24 hours in advance. After the change is stable, increase TTL back.

Mistake 3: Missing Default Geolocation Record

Geolocation routing without a default record means users outside your defined geographies get no DNS response — effectively an outage for those users. Always include a default record.

Mistake 4: CNAME at Zone Apex

Using CNAME for the root domain (example.com) violates the DNS specification and breaks other records (MX, TXT). Use Alias records instead — they work at the zone apex and are free.

Mistake 5: No Monitoring on Health Checks

Health checks transition to unhealthy and trigger failover, but no one is notified. Set CloudWatch alarms on HealthCheckStatus for every health check so the team knows when failover occurs.

Getting Started

Route 53 is the entry point for all traffic to your application. Combined with CloudFront for content delivery, ALBs for load balancing, and multi-Region architectures for disaster recovery, it provides the DNS and traffic management layer that production applications require.

For DNS architecture design, multi-Region traffic management, and infrastructure review, talk to our team.

Contact us to design your DNS architecture →

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »
AWS VPC Networking Best Practices for Production

AWS VPC Networking Best Practices for Production

A practical guide to AWS VPC networking — CIDR planning, subnet strategies, NAT gateways, VPC endpoints, Transit Gateway, and the network architecture patterns that scale with your organization.

AWS Backup Strategies: Automated Data Protection

AWS Backup Strategies: Automated Data Protection

A practical guide to AWS Backup — backup plans, vault policies, cross-Region and cross-account copies, RPO/RTO alignment, and the data protection patterns that keep production workloads recoverable.