---
title: AWS VPC Networking Best Practices for Production
description: A VPC misdesign at month two becomes a multi-quarter migration at year two. CIDR planning, subnet strategies, NAT gateways, VPC endpoints, Transit Gateway, and the network architecture patterns that scale without forcing a re-IP.
url: https://www.factualminds.com/blog/aws-vpc-networking-best-practices-for-production/
datePublished: 2026-02-17T00:00:00.000Z
dateModified: 2026-05-14T00:00:00.000Z
author: Palaniappan P
category: Cloud Architecture
tags: vpc, networking, aws, architecture, security
---

# AWS VPC Networking Best Practices for Production

> A VPC misdesign at month two becomes a multi-quarter migration at year two. CIDR planning, subnet strategies, NAT gateways, VPC endpoints, Transit Gateway, and the network architecture patterns that scale without forcing a re-IP.

Networking is the foundation that every other AWS service runs on. A well-designed VPC provides security isolation, predictable routing, and the flexibility to grow without re-architecting. A poorly designed VPC leads to overlapping IP ranges that prevent connectivity, public-facing resources that should be private, and networking costs that grow faster than the workloads they support.

Most networking mistakes are made during initial setup and are expensive to fix later. This guide covers the decisions that matter — CIDR planning, subnet strategy, connectivity patterns, and cost optimization — so you get the network right the first time.

**May 2026 refresh:** AWS publishes a maintained **VPC security best practices** checklist (multi-AZ subnets, layered security groups and NACLs, IAM access, Flow Logs, and optional AWS Network Firewall / GuardDuty integration). Use it as the service-truth companion to this architecture-focused walkthrough: [Security best practices for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-best-practices.html).

## VPC Design

### CIDR Planning

CIDR (Classless Inter-Domain Routing) defines your VPC's IP address range. Getting this right is critical because VPC CIDRs cannot overlap if you need to connect VPCs together, and expanding a VPC CIDR after creation has limitations.

**Recommended CIDR ranges:**

| Environment     | CIDR         | Usable IPs | Rationale                      |
| --------------- | ------------ | ---------- | ------------------------------ |
| Production      | 10.0.0.0/16  | 65,536     | Room for growth, many subnets  |
| Staging         | 10.1.0.0/16  | 65,536     | Mirrors production for testing |
| Development     | 10.2.0.0/16  | 65,536     | Developer workloads            |
| Shared Services | 10.10.0.0/16 | 65,536     | CI/CD, DNS, shared tools       |

**Rules:**

- Use /16 for production VPCs — smaller ranges limit future growth
- Never overlap CIDRs across VPCs that might need to communicate
- Reserve ranges for on-premises connectivity (avoid 10.0.0.0/8 if your data center uses it)
- Document your CIDR allocation in a central registry

In a [multi-account organization](/blog/aws-multi-account-strategy-landing-zone-best-practices/), plan CIDRs centrally to prevent overlaps across accounts.

### Subnet Strategy

**Three-tier architecture:**

```
VPC: 10.0.0.0/16
├── Public Subnets (internet-facing)
│   ├── 10.0.1.0/24 (AZ-a) — ALB, NAT Gateway, bastion hosts
│   ├── 10.0.2.0/24 (AZ-b)
│   └── 10.0.3.0/24 (AZ-c)
├── Private Subnets (application tier)
│   ├── 10.0.11.0/24 (AZ-a) — ECS tasks, Lambda, EC2 instances
│   ├── 10.0.12.0/24 (AZ-b)
│   └── 10.0.13.0/24 (AZ-c)
└── Data Subnets (database tier)
    ├── 10.0.21.0/24 (AZ-a) — RDS, ElastiCache, OpenSearch
    ├── 10.0.22.0/24 (AZ-b)
    └── 10.0.23.0/24 (AZ-c)
```

**Three Availability Zones** — Always deploy across at least 2 AZs for high availability. Three AZs provide better fault tolerance and are required for some services (e.g., Amazon MSK, Aurora Multi-AZ with 2 readers).

**Public subnets** have a route to the Internet Gateway. Only resources that must receive inbound traffic from the internet should be here — ALBs, NAT Gateways, and (rarely) bastion hosts.

**Private subnets** have a route to a NAT Gateway (for outbound internet access) but no inbound internet route. Application workloads live here.

**Data subnets** have no internet access at all — no NAT Gateway route. Databases should never initiate outbound internet connections. AWS service access uses VPC endpoints.

### Security Groups vs NACLs

**Security Groups** — Stateful, instance-level firewall. The primary access control mechanism:

| Rule                   | Source                     | Port | Purpose             |
| ---------------------- | -------------------------- | ---- | ------------------- |
| ALB → Application      | ALB security group         | 8080 | Application traffic |
| Application → Database | Application security group | 5432 | Database queries    |
| Application → Redis    | Application security group | 6379 | Cache access        |

**Best practice:** Reference security groups by ID (not CIDR) whenever possible. `sg-abc123` is self-documenting and adapts automatically when instances are added or removed.

**NACLs (Network ACLs)** — Stateless, subnet-level firewall. Use as a secondary defense layer:

- Default NACLs allow all traffic — do not rely on them for security
- Custom NACLs block known malicious IP ranges or restrict traffic between subnet tiers
- NACLs require explicit allow rules for both inbound and outbound (stateless)

**Recommendation:** Use security groups as the primary control. Add NACLs only for specific requirements (blocking IP ranges, enforcing subnet-level restrictions for compliance).

## Internet Connectivity

### NAT Gateways

NAT Gateways provide outbound internet access for private subnet resources (package updates, API calls, SaaS integrations):

**Cost:**

- $0.045/hour per NAT Gateway = $32.40/month
- $0.045/GB data processed

**High availability:** Deploy one NAT Gateway per AZ. If AZ-a's NAT Gateway fails, resources in AZ-a lose internet access — but resources in AZ-b and AZ-c continue normally.

**Cost optimization:**

- A single NAT Gateway in one AZ works for development environments ($32/month vs $97/month for three)
- Use VPC endpoints for AWS service traffic to reduce NAT Gateway data processing charges
- S3 and DynamoDB Gateway endpoints are free — always deploy them

### VPC Endpoints

VPC endpoints provide private connectivity to AWS services without traversing the internet or NAT Gateway:

**Gateway endpoints (free):**

- S3
- DynamoDB

Always deploy these. They are free and reduce NAT Gateway costs.

**Interface endpoints ($0.01/hour per AZ + $0.01/GB):**

- ECR (for container image pulls)
- CloudWatch Logs (for log shipping)
- STS (for IAM role assumption)
- Secrets Manager / SSM Parameter Store
- KMS
- SQS, SNS, EventBridge
- Lambda, Step Functions

**Cost analysis:** An interface endpoint in 3 AZs costs ~$21.60/month. If your workload processes more than 480 GB/month through NAT Gateway to reach that service, the endpoint is cheaper. For high-traffic services (ECR, CloudWatch Logs), endpoints almost always save money.

**Security benefit:** VPC endpoints keep traffic within the AWS network. Data never traverses the public internet, reducing the attack surface.

## Multi-VPC Connectivity

### VPC Peering

Point-to-point connectivity between two VPCs:

```
VPC A ←→ VPC B (peering connection)
```

**Advantages:** Simple, no additional cost (data transfer charges only), low latency.

**Limitations:** Not transitive (VPC A ↔ B and VPC B ↔ C does not mean VPC A ↔ C). For more than 3-4 VPCs, peering creates an unmanageable mesh.

**Best for:** Connecting 2-3 VPCs in simple architectures.

### Transit Gateway

Hub-and-spoke connectivity for multiple VPCs and on-premises networks:

```
VPC A ───┐
VPC B ───┤
VPC C ───┼─── Transit Gateway ─── On-premises (VPN / Direct Connect)
VPC D ───┤
VPC E ───┘
```

**Advantages:**

- Centralized routing — one hub connects all VPCs
- Transitive routing — any VPC can reach any other VPC through the hub
- VPN and Direct Connect integration
- Route tables for segmentation (production VPCs cannot reach development VPCs)
- Cross-Region peering for multi-Region architectures

**Cost:** $0.05/hour per attachment + $0.02/GB data processed. A Transit Gateway with 5 VPC attachments costs ~$180/month before data transfer.

**Best for:** [Multi-account organizations](/blog/aws-multi-account-strategy-landing-zone-best-practices/) with 4+ VPCs, hybrid connectivity, or network segmentation requirements.

### PrivateLink

Expose a service from one VPC to another without VPC peering:

```
Consumer VPC → VPC Endpoint (Interface) → PrivateLink → NLB → Provider VPC
```

**Best for:** Sharing specific services (APIs, databases) across accounts without full network connectivity. The consumer only accesses the specific service endpoint — not the provider's entire VPC.

## Hybrid Connectivity

### AWS VPN

Encrypted tunnels over the public internet:

| Option           | Bandwidth                  | Latency             | Cost                               |
| ---------------- | -------------------------- | ------------------- | ---------------------------------- |
| Site-to-Site VPN | Up to 1.25 Gbps per tunnel | Variable (internet) | $0.05/hour + data transfer         |
| Client VPN       | Per-connection             | Variable            | $0.10/hour + $0.05/connection-hour |

**Best for:** Quick connectivity setup, backup for Direct Connect, remote developer access.

### AWS Direct Connect

Dedicated network connection from your data center to AWS:

| Option                                | Bandwidth | Latency         | Cost                            |
| ------------------------------------- | --------- | --------------- | ------------------------------- |
| Dedicated (1 Gbps, 10 Gbps, 100 Gbps) | Dedicated | Consistent, low | Port fee + data transfer        |
| Hosted (50 Mbps - 10 Gbps)            | Shared    | Consistent, low | Partner pricing + data transfer |

**Best for:** Production hybrid workloads requiring consistent latency and high throughput. Financial services, healthcare, and any workload with data residency requirements.

**High availability:** Deploy Direct Connect connections in two different Direct Connect locations. Use VPN as a backup for Direct Connect.

## Network Cost Optimization

### Data Transfer Costs

Data transfer is often the largest hidden cost in AWS networking:

| Transfer Type            | Cost                    |
| ------------------------ | ----------------------- |
| Inbound (internet → AWS) | Free                    |
| Same AZ                  | Free                    |
| Cross-AZ (same Region)   | $0.01/GB each direction |
| Cross-Region             | $0.02/GB                |
| Internet outbound        | $0.09/GB (first 10 TB)  |
| NAT Gateway processing   | $0.045/GB               |
| VPC endpoint processing  | $0.01/GB                |

**Cost reduction strategies:**

- Keep communicating services in the same AZ when possible (free vs $0.02/GB cross-AZ)
- Use [CloudFront](/services/aws-cloudfront-consultant/) for content delivery (cheaper outbound rates: $0.085/GB vs $0.09/GB, and cached content eliminates origin transfer)
- Deploy S3 and DynamoDB gateway endpoints (free, eliminates NAT Gateway charges)
- Use VPC endpoints for high-traffic AWS services
- Compress data in transit to reduce GB transferred

### NAT Gateway Cost Reduction

NAT Gateways charge $0.045/GB for data processing. For workloads making heavy use of AWS services:

1. **Deploy S3 and DynamoDB gateway endpoints** — Free, eliminates NAT charges for the two highest-volume services
2. **Deploy interface endpoints for ECR** — Container image pulls from ECR through NAT are expensive; endpoint is cheaper for most workloads
3. **Deploy CloudWatch Logs endpoint** — Log shipping volume can be significant
4. **Consolidate internet access** — If multiple VPCs need internet access, route through a centralized NAT in a shared VPC via Transit Gateway

## Monitoring

### VPC Flow Logs

Enable VPC Flow Logs in **every VPC, every region, every account** — not only the production VPCs you remember to configure. Most network-based attacks target the regions and VPCs nobody is watching.

- **Accepted traffic** — Useful for understanding communication patterns and traffic volume
- **Rejected traffic** — Security monitoring (port scans, unauthorized access attempts)
- **All traffic** — Complete visibility (highest cost)

Send flow logs to S3 for long-term analysis or CloudWatch Logs for real-time alerting. Use [Athena](/services/aws-data-analytics/) to query flow logs in S3 for network forensics. For cost-sensitive environments, enable sampling (1-in-N records) on dev/staging VPCs while keeping full logging on production. Enforce coverage organization-wide with the AWS Config rule `vpc-flow-logs-enabled` plus an EventBridge rule that auto-remediates any VPC created without flow logs.

### Network Monitoring

Set [CloudWatch alarms](/blog/aws-cloudwatch-observability-metrics-logs-alarms-best-practices/) for:

- NAT Gateway `ErrorPortAllocation` — NAT Gateway running out of ports (scale or split traffic)
- NAT Gateway `BytesOutToDestination` — Unexpected data transfer volume
- Transit Gateway `BytesIn/BytesOut` — Traffic volume anomalies
- VPN `TunnelState` — VPN tunnel down

## Common Mistakes

### Mistake 1: Insufficient CIDR Planning

Starting with a /24 VPC (256 IPs) and discovering you need more IPs after deploying 50 services. While you can add secondary CIDRs, the expanded range may overlap with other VPCs. Plan for growth with /16 VPCs from the start.

### Mistake 2: Everything in Public Subnets

Placing application servers and databases in public subnets because "it is easier to access." Every resource that does not need inbound internet traffic should be in a private subnet. Use ALBs for inbound traffic and NAT Gateways for outbound.

### Mistake 3: No VPC Endpoints

Routing all AWS service traffic through NAT Gateways when gateway endpoints (S3, DynamoDB) are free. Deploy gateway endpoints in every VPC — they cost nothing and save significant NAT Gateway data processing fees.

### Mistake 4: Single-AZ Deployment

Deploying all resources in a single Availability Zone. When that AZ has an incident (hardware failure, network issue), your entire application goes down. Always deploy across at least 2 AZs for production workloads.

### Mistake 5: Leaving the Default Security Group Permissive

Every VPC ships with a `default` security group that cannot be deleted. Out of the box it allows all traffic between resources that share it — and any resource launched without an explicit security group association inherits it. The result is an accidental flat network where forgotten EC2 instances, Lambda functions, and RDS endpoints can all talk to each other.

**Harden the default security group on every VPC:**

- Remove every inbound rule and every outbound rule, leaving the security group with zero rules. The group still exists (it must), but it grants no access.
- Always launch resources with an explicit, purpose-built security group — never rely on the default.
- Enforce the closed state with the AWS Config rule `vpc-default-security-group-closed` so any drift triggers an alert.
- Add an SCP that denies `ec2:AuthorizeSecurityGroupIngress` and `ec2:AuthorizeSecurityGroupEgress` against any security group named `default`, so even an admin cannot accidentally open it.

## Getting Started

VPC networking is the foundation that determines the security, connectivity, and cost characteristics of everything you build on AWS. Getting the network right during initial setup prevents expensive re-architecture later.

For network architecture design as part of your [AWS architecture review](/services/aws-architecture-review/), multi-account networking with Transit Gateway, or hybrid connectivity planning, talk to our team.

[Contact us to design your network architecture →](/contact-us/)

## FAQ

### What CIDR range should I use for an AWS VPC?
Use a /16 from RFC 1918 space for production VPCs (10.0.0.0/16 gives 65,536 addresses with room for many subnets). Plan ranges centrally across accounts so VPCs that may need to peer or route through Transit Gateway never overlap — overlapping CIDRs are the most common networking mistake and the most expensive to fix later. Reserve a contiguous block (10.0.0.0/8 or 172.16.0.0/12) for AWS, separate from on-prem and partner ranges.

### How many subnets should a production VPC have per Availability Zone?
Three per AZ across three AZs (nine subnets total): public (ALB, NAT Gateway, bastion), private (ECS, Lambda, EC2 application tier), and data (RDS, ElastiCache, OpenSearch with no internet route). Three AZs are required for highest availability and several services (Aurora Multi-AZ with 2 readers, MSK). Use /24 subnets unless you need IP density beyond 251 hosts.

### When should I use security groups vs NACLs?
Security groups are stateful instance-level firewalls and the primary access control mechanism — reference them by group ID rather than CIDR (`sg-abc123`) so they self-update as instances scale. NACLs are stateless subnet-level firewalls; use them as a coarse secondary defense (block specific known-bad IPs, enforce strict subnet boundaries) but never as the primary control. Default-allow NACLs are fine in most production VPCs.

### When should I use Transit Gateway vs VPC peering?
VPC peering for 1-to-1 connections within a small number of VPCs (peering is point-to-point and does not transit). Transit Gateway when you have 3+ VPCs, multiple accounts, hybrid connectivity (Direct Connect / Site-to-Site VPN), or need centralized inspection. Transit Gateway costs more (per attachment + per GB processed) but eliminates the n² peering mesh and integrates with AWS Network Firewall for inspection.

### How do VPC endpoints reduce NAT Gateway costs?
NAT Gateway charges per GB processed for outbound traffic. Routing AWS-service API calls (S3, DynamoDB, KMS, Secrets Manager, ECR, CloudWatch, etc.) through Gateway VPC endpoints (free for S3 and DynamoDB) or Interface VPC endpoints ($0.01/hour per AZ + per-GB) keeps that traffic on AWS backbone — the per-GB savings on a workload pulling container images or writing CloudWatch logs typically pays for the endpoints in days.

---

*Source: https://www.factualminds.com/blog/aws-vpc-networking-best-practices-for-production/*
