---
title: HashiCorp Vault on AWS
description: HashiCorp Vault on AWS: dynamic DB credentials, transit-engine encryption, HCP Vault Secrets, and EKS Secrets Operator vs AWS Secrets Manager guidance.
url: https://www.factualminds.com/integrations/hashicorp-vault-aws/
category: secrets
updated: 2026-04-29
---

# HashiCorp Vault on AWS

> Centralised secret management, dynamic credentials, and envelope encryption with Vault — sitting alongside AWS KMS, Secrets Manager, and IAM.

## HashiCorp Vault on AWS

Vault is an enterprise secret management and encryption-as-a-service platform. On AWS it sits alongside IAM, KMS, and Secrets Manager — owning the domains those services do not cover as well: dynamic database credentials with sub-hour TTLs, transit-engine encryption for many small items, multi-cloud policy consistency, and centralised PKI for both AWS and on-prem certificates.

> **Licensing note (2026)**: IBM closed its acquisition of HashiCorp in early 2025. Vault remains under the Business Source License 1.1 adopted in August 2023 — free for all non-competing production use, with enterprise features behind a commercial licence. OpenBao (community fork) exists as a Linux Foundation project but lacks DR replication, namespaces, and FIPS transit for regulated workloads. Always verify current terms at hashicorp.com.

## Why Vault on AWS

**Centralised secret storage**

- Passwords, API keys, TLS certificates, and tokens in one audited store.
- Encryption at rest (AES-256-GCM) with automatic key rotation; audit device captures every read, write, and token operation.
- KMS auto-unseal removes the operator burden of handling unseal keys in person.

**Dynamic credentials**

- Generate temporary RDS/Aurora/Redshift/MongoDB/Snowflake passwords on demand, valid for 1 hour by default, auto-revoked at expiry.
- Shrinks blast radius from "replay leaked creds forever" to "replay for under an hour".
- Per-app credentials and per-session audit means you can answer "which microservice or pipeline run caused this DB lock?" without guesswork.

**Transit engine (encryption-as-a-service)**

- Send plaintext, receive ciphertext without Vault ever storing the plaintext.
- Convergent encryption, derived keys, and key rotation without re-encrypting every record.
- Throughput and cost profile that outperforms raw KMS calls for many small items.

**Multi-cloud & on-prem**

- Same policy model across AWS, Azure, GCP, and on-prem workloads — important for M&A, hybrid, and regulated environments that cannot put all secrets in one cloud.

## Vault vs AWS Secrets Manager — decision matrix

| Question                                             | Secrets Manager          | Vault                                |
| ---------------------------------------------------- | ------------------------ | ------------------------------------ |
| Single-cloud AWS workload?                           | ✅ Preferred             | Overkill for most                    |
| Need dynamic DB creds under 60 min TTL?              | ❌                       | ✅                                   |
| Need transit engine / envelope encryption at volume? | ❌ (use KMS directly)    | ✅                                   |
| Multi-cloud or hybrid consistency required?          | ❌                       | ✅                                   |
| Need centralised PKI for AWS + on-prem?              | Partial (ACM Private CA) | ✅                                   |
| Need SSH CA for ephemeral server access?             | ❌                       | ✅                                   |
| AWS-native rotation + Lambda rotators is enough?     | ✅                       | Overkill                             |
| Existing Vault footprint across org?                 | —                        | ✅                                   |
| Simplest audit via CloudTrail?                       | ✅                       | Vault audit device (fine, but extra) |
| Cost for small AWS-only team?                        | Lower                    | Higher (infra or HCP)                |

**Default recommendation**: start with Secrets Manager for AWS-only workloads; add Vault when a specific driver above applies. Many regulated customers run both — Secrets Manager for AWS-service consumers, Vault for dynamic DB creds and transit, with Vault Secrets Sync keeping a one-way mirror to Secrets Manager for ECS/Lambda ergonomics.

## Vault architecture on AWS

**Self-hosted (control-plane-sensitive workloads)**

- 3–5 node cluster on EC2 in an Auto Scaling Group across AZs.
- **Integrated Storage (Raft)** is now the HashiCorp-recommended backend — DynamoDB/S3 backends are still supported but Raft is simpler, faster, and enables performance replication to DR regions.
- Network Load Balancer in front for TLS termination via ACM.
- **KMS auto-unseal** — Vault uses an AWS KMS key to unseal itself after restart; rotate the KMS key annually.
- VPC endpoints for KMS, STS, and CloudWatch to keep traffic off the internet.

**HCP Vault Dedicated** (managed cluster)

- HashiCorp runs the cluster; you consume via AWS PrivateLink.
- Dev tier starts ~$200/month; production tiers scale by node count and replication.
- Best when you want a full Vault feature set without running the cluster yourself.

**HCP Vault Secrets** (lightweight SaaS) — GA 2024

- REST API for static secrets; free tier up to 25 secrets; paid from ~$0.03/secret/month.
- Best starting point for teams that need a managed key-value store with better audit than Parameter Store but do not yet need dynamic or transit.

## Authentication methods we deploy

- **AWS auth method** — EC2 instances and Lambda functions authenticate to Vault using their instance identity document or IAM role; Vault verifies via AWS STS.
- **Kubernetes auth** — pods authenticate with their projected ServiceAccount token; Vault verifies against the cluster's TokenReview API. On EKS, pair with **Pod Identity** for outbound calls.
- **OIDC / JWT** — authenticate GitHub Actions, GitLab CI, and human SSO via an OIDC trust relationship; pairs with OIDC subject-claim filtering similar to the pattern we use for AWS IAM + GitHub Actions.
- **AppRole** — service-to-service authentication for on-prem or legacy workloads that cannot use IAM/OIDC.

## Secret engines we deploy on AWS

- **Database** — dynamic RDS/Aurora (Postgres/MySQL), Redshift, MongoDB Atlas, and Snowflake credentials with configurable TTL and max-TTL.
- **AWS** — generate temporary IAM access keys or assume-role credentials; useful for short-lived CLI sessions or third-party tools that cannot use IAM directly.
- **Transit** — encryption-as-a-service with convergent encryption, key rotation, and datakey generation.
- **PKI** — issue X.509 TLS certs for services running on AWS and on-prem; ACME server (Vault 1.14+) means cert-manager and traditional ACME clients can pull from Vault directly.
- **SSH CA** — sign short-lived SSH certs for engineer access to EC2 bastion hosts or on-prem Linux fleets.

## Vault Secrets Operator (VSO) for EKS

The 2026 default pattern for Kubernetes workloads on EKS:

1. Install VSO via Helm; configure a `VaultConnection` and `VaultAuth` pointing at your Vault cluster with Kubernetes or JWT auth.
2. App teams declare `VaultStaticSecret` or `VaultDynamicSecret` CRDs in their namespace; VSO reconciles them into native Kubernetes Secrets that the app consumes as normal.
3. VSO handles renewal and rotation automatically; dynamic secrets flow into a rolling Deployment restart when TTL approaches expiry.
4. Pair with EKS Pod Identity for the outbound AWS calls VSO makes during auth verification.

This replaces the legacy Vault Agent sidecar + init-container pattern for most workloads. Fall back to Agent-sidecar + tmpfs when secrets must never sit in etcd.

## Vault 1.17 / 1.18 features worth enabling

- **Adaptive overload protection** — targeted request-type throttling so a misbehaving client cannot take the whole cluster down.
- **Multi-issuer PKI with ACME** — Vault can be the ACME server for cert-manager on EKS and for external workloads.
- **KV v2 transformations** — patch subkeys atomically; useful for complex configuration documents.
- **Secrets Sync GA** — one-way sync Vault → AWS Secrets Manager / GitHub / GCP / Vercel / HCP Terraform; keeps AWS-native consumers working while Vault stays the source of truth.
- **Workload identity federation** — authenticate Vault to other systems without stored IAM credentials.

## Implementation: VaultDynamicSecret with VSO

```yaml
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
  name: orders-db-creds
  namespace: orders
spec:
  vaultAuthRef: orders-eks-auth
  mount: database
  path: creds/orders-app
  destination:
    create: true
    name: orders-db
  refreshAfter: 30m # rotate before TTL expires
  rolloutRestartTargets:
    - kind: Deployment
      name: orders-api
```

The `rolloutRestartTargets` restarts the Deployment on rotation so applications pick up new credentials cleanly. For applications that hot-reload secrets without restart, omit and have the app re-read the projected secret on `403 PERMISSION_DENIED`.

## Failure modes & resilience

**1. Leader election timeout / quorum loss.** Raft requires a majority — losing 2 of 3 nodes makes the cluster unavailable. Recovery: identify the surviving node, run `vault operator raft remove-peer` for the dead nodes, then `vault operator raft join` for replacements. Critical: never `force-leave` a healthy node by mistake. Practice this in a non-prod cluster every quarter.

**2. KMS auto-unseal key revocation.** If the auto-unseal KMS key is deleted, scheduled for deletion, or has its key policy revoked, restarted Vault nodes cannot unseal. Mitigation: KMS key with `DeletionWindowInDays = 30`; alert on key policy changes via CloudTrail; replicate the key to a DR region with `aws_kms_replica_key`. Test annual key rotation in staging — the auto-unseal seal handles rotation transparently if both old and new versions are accessible.

**3. Transit-engine throughput ceiling.** A single Vault node maxes around 2,000–5,000 transit ops/s depending on instance type and key type (AES-GCM faster than RSA). Symptom: `429 Too Many Requests` on transit endpoints. Mitigation: scale horizontally via performance replication standby clusters serving read-heavy transit traffic; use `derived = true` keys to amortize key derivation; consider client-side envelope encryption with KMS-wrapped DEKs for very high throughput.

**4. Dynamic-DB credential renewal collisions.** A burst of new pods all requesting credentials simultaneously can hit per-database connection limits while Vault creates short-lived users. Mitigation: set DB role `max_connections_per_role`; use `default_ttl ≥ 1h` so steady-state pods don't churn; alarm on `database_connections_used` for the upstream RDS/Aurora.

**5. Audit device backpressure.** Vault blocks all requests if audit logging fails. Symptom: cluster appears healthy but every request 500s. Mitigation: configure two audit devices (file + syslog or file + socket); the request succeeds if at least one writes. Monitor audit-device disk usage and rotate logs aggressively.

**6. Token-policy drift.** Long-lived service tokens accumulate over years; orphaned tokens from departed engineers persist. Mitigation: enforce token max-TTL via `auth/token/tune`; run `vault list auth/token/accessors` quarterly and revoke orphans; prefer auth methods (AWS, K8s, OIDC) over raw tokens.

**7. Performance replication lag.** Read-heavy secondaries can lag the primary during heavy write traffic. Stale reads on a secondary may serve outdated dynamic credentials. Mitigation: route writes and reads-after-writes to the primary; alert on `vault.replication.wal_lag` exceeding the SLA.

## Observability runbook

**Metrics to scrape (Vault Prometheus endpoint or Datadog Vault integration):**

| Metric                                    | Alarm threshold       | First action                                                          |
| ----------------------------------------- | --------------------- | --------------------------------------------------------------------- |
| `vault.core.unsealed`                     | `= 0`                 | Page on-call; check KMS auto-unseal key health                        |
| `vault.core.leader_election_count`        | `> 1` per hour        | Investigate node health, network partitions; review Raft quorum       |
| `vault.audit.log_request_failure`         | any                   | Rotate to backup audit device; investigate disk / sink health         |
| `vault.runtime.alloc_bytes`               | rising trend          | Memory leak; check for client misuse (many ephemeral tokens)          |
| `vault.token.creation`                    | spike `> 5×` baseline | Likely runaway client or auth-method abuse                            |
| `vault.replication.wal_lag` (HCP/PR)      | `> 1 min`             | Network or primary write storm; investigate                           |
| `vault.adaptive_overload.throttled_count` | sustained `> 0`       | Client overload protection kicking in; identify and rate-limit caller |

**Raft snapshot + restore runbook:**

```bash
# Daily snapshot — schedule via systemd timer or EventBridge-triggered runner
vault operator raft snapshot save /tmp/vault-$(date +%F).snap
aws s3 cp /tmp/vault-$(date +%F).snap \
  s3://acme-vault-snapshots/$(date +%F)/ \
  --sse aws:kms --sse-kms-key-id alias/vault-snapshot

# Restore (DR — only in a fresh cluster)
vault operator raft snapshot restore -force /tmp/vault-2026-04-29.snap
# Re-init unseal: each operator unseals with their key share or KMS auto-unseal
```

Test restore quarterly to a staging cluster. Tabletop drill: "primary AZ is gone — recover in under 30 minutes". Document the actual time and improve.

**Unseal-key rotation cadence:** Rotate the KMS auto-unseal key annually by adding a new alias and updating the seal stanza:

```hcl
seal "awskms" {
  region     = "eu-west-1"
  kms_key_id = "alias/vault-unseal-2026"
}
```

The seal supports key migration via `vault operator seal-migration`. Old material remains accessible for decrypt during the transition; do not delete the previous key until the migration is verified across all nodes.

**Debug path: "client getting 403":**

1. `vault token lookup` (with the client token) — check expiry, policies, accessor.
2. `vault read sys/policies/acl/<policy>` — confirm path and capability match the failed request.
3. Audit log on the Vault node: search for the request ID; look at `error` field.
4. If using AWS or K8s auth: confirm the bound entity (instance role, ServiceAccount) still matches the auth role; trust source rotated?
5. Adaptive overload: check `vault.adaptive_overload.throttled_count` — the client may be throttled, not denied.

## When Vault is NOT the right call

- Single-cloud AWS workload with simple static/rotated secrets — use **AWS Secrets Manager** + KMS CMKs; audit via CloudTrail; done.
- No need for dynamic credentials and no multi-cloud ambition — Vault's operational cost outweighs the benefit.
- Tiny team with no dedicated platform engineer — HCP Vault Secrets is the lightweight option, but even that is extra surface vs Secrets Manager for most use cases.
- Hard requirement for OSI-approved open source — OpenBao exists, but at the cost of enterprise features you probably need if you were considering Vault in the first place.

## Best practices

**Security**

- Enable audit device to a forward-only CloudWatch Logs stream and S3 Object Lock bucket; treat audit log integrity as part of the SOC 2 / PCI evidence pipeline.
- Human access to Vault admin functions requires MFA and a separate `admin` namespace — never the root token.
- Rotate KMS auto-unseal keys annually; tie the rotation into the control-plane pipeline.
- IP allow-list the Vault API via VPC endpoint policies or AWS Network Firewall rules for public-facing ALBs.

**Operations**

- Back up Raft snapshots to an S3 bucket with Object Lock; test restore quarterly.
- Performance replication or DR replication to a secondary region for RPO-sensitive workloads.
- Monitor with Datadog, CloudWatch, or the native Prometheus endpoint; alert on leader changes, sealed state, and adaptive-overload throttling.

**Application integration**

- Prefer Vault Secrets Operator over sidecars for EKS workloads.
- Cache secrets in-process with a TTL slightly shorter than Vault's TTL; handle 403 gracefully by re-fetching.
- Never log secret values — enable Vault's secret-ID filtering on audit devices so secret material never hits logs.

## Related reading

- [`AWS Secrets Manager vs Parameter Store: when to use which`](/blog/aws-secrets-manager-vs-parameter-store-when-to-use-which/)
- [`PCI DSS compliance on AWS: architecture guide for fintech`](/blog/pci-dss-compliance-aws-architecture-guide-fintech/)
- [`How to achieve SOC 2 compliance on AWS in 2026`](/blog/how-to-achieve-soc2-compliance-aws-2026/)

## Related services

- [AWS Cloud Security](/services/aws-cloud-security/)
- [Cloud Compliance Services](/services/cloud-compliance-services/)
- [DevOps Pipeline Setup](/services/devops-pipeline-setup/)

## FAQ

### Should I use Vault or AWS Secrets Manager?
For single-cloud AWS workloads with straightforward static or RDS-rotated secrets, AWS Secrets Manager wins on simplicity, cost, and audit (CloudTrail events out of the box). Use Vault when you need: (a) dynamic database credentials with sub-hour TTLs; (b) the transit engine for envelope encryption without handing data to a KMS call for every byte; (c) multi-cloud consistency (AWS + Azure + GCP with one policy model); (d) centralised PKI issuing TLS certs to both AWS and on-prem; or (e) an existing Vault footprint with SSH CA, AppRole, and OIDC auth methods already in place.

### Do I need dynamic database credentials, or will rotation in Secrets Manager do?
Ask three questions. (1) Does a leaked credential need to be useless in under 24 hours? If yes, dynamic — Secrets Manager rotation windows are usually longer. (2) Do you have more than a handful of apps/microservices connecting to the same database? Dynamic gives each app its own credential and its own audit trail. (3) Do you need per-request audit of which human or CI run pulled a credential? Vault's audit device captures that natively; Secrets Manager captures IAM principal on GetSecretValue but not the downstream DB session. For PCI DSS 4.0.1 fintech workloads we default to Vault dynamic credentials for production databases.

### How do I deploy Vault on AWS in 2026?
Three tiers. (1) HCP Vault Secrets — lightweight SaaS API for static secrets, free tier up to 25 secrets, reached GA in 2024. Right starting point if you just need a managed key-value with better audit than environment variables. (2) HCP Vault Dedicated — fully managed Vault cluster, HashiCorp operates it, AWS PrivateLink available, starts ~$200/month dev tier. (3) Self-hosted on EC2 with Integrated Storage (Raft) — three nodes in an ASG across AZs, NLB, KMS auto-unseal, DynamoDB for DR copy if needed. Self-host only if you need features HCP does not ship, or if your compliance posture forbids external control planes.

### How do applications on EKS get secrets from Vault without sidecars?
The modern pattern is the Vault Secrets Operator (VSO). Install VSO in the cluster; apps reference secrets via a VaultStaticSecret or VaultDynamicSecret CRD that VSO reconciles into a native Kubernetes Secret. The app reads the k8s Secret as normal — no Vault Agent sidecar, no init container. Pair with EKS Pod Identity so VSO authenticates to Vault using short-lived AWS STS tokens. For secrets that must never sit in etcd, fall back to the Vault Agent sidecar with a tmpfs mount.

### What is the transit engine and when should I use it over KMS?
Vault's transit engine is encryption-as-a-service — you send plaintext, Vault returns ciphertext using a named key that Vault manages, rotates, and audits. Use it when (a) you need envelope encryption over many small data items and a per-byte KMS call is cost-prohibitive, (b) you need convergent encryption or format-preserving encryption that KMS does not offer, (c) you want the same encryption contract across AWS, on-prem, and another cloud. Use AWS KMS (with customer-managed keys) when the workload is AWS-only and envelope encryption can sit behind S3/DynamoDB/RDS integrations.

### What changed with Vault 1.17 and 1.18 that I should know about?
Headline items: (a) **Adaptive overload protection** — Vault 1.17+ throttles specific request types before the whole cluster degrades, which solves the "one greedy client takes Vault down" failure mode. (b) **PKI scalability** — multi-issuer improvements and ACME server support. (c) **KV v2 transformations** and subkey-level patching. (d) **Workload identity federation** — authenticate Vault to AWS/GCP/Azure/GitHub without a stored IAM user. (e) **Secrets Sync GA** — sync Vault secrets one-way to AWS Secrets Manager so native AWS consumers (Lambda, ECS) keep working while Vault remains the source of truth.

### What is the IBM + HashiCorp status and does it affect Vault on AWS?
IBM closed its acquisition of HashiCorp in early 2025. The practical implications for AWS users: the Business Source License (BUSL 1.1) remains — Vault is not "open source" in the OSI sense but is free for all non-competing production use. HashiCorp continues to ship the AWS integrations (AWS auth method, AWS secret engine, KMS auto-unseal, HCP Vault on AWS) and the roadmap for HCP Vault Secrets and Vault Secrets Operator has accelerated. For licensing-sensitive enterprises, OpenBao (community fork of pre-BUSL Vault) is a tracked alternative but lacks the enterprise features (namespaces, DR replication, transit FIPS) most regulated AWS customers require.

---

*Source: https://www.factualminds.com/integrations/hashicorp-vault-aws/*
