How to Use Redis and Valkey as a Cost-Saving Layer (Not Just Cache)
Quick summary: Redis and its fork Valkey reduce AWS costs beyond caching: rate limiting, session storage, and distributed coordination all have cheaper implementations via in-memory data structures than the AWS-managed alternatives. Here is how to use them.

Table of Contents
Most teams deploy Redis as a cache and nothing else. They add it to reduce database reads, see a performance improvement, and leave it there. What they miss is that Redis — or Valkey, its Apache-licensed successor — can replace half a dozen other AWS services at a fraction of the cost, often with better performance.
This guide covers the cost math for replacing DynamoDB sessions, SQS for simple queues, API Gateway throttling for rate limiting, and coordination via distributed locks. Then it covers the operational details that prevent those cost savings from disappearing in incidents.
Redis vs Valkey in 2026: What Changed and What Matters
The License Fork
In March 2024, Redis Ltd. relicensed Redis 7.4+ under two non-open-source licenses: RSALv2 (Redis Source Available License v2) and SSPLv1 (Server Side Public License v1). Neither license is approved by the Open Source Initiative. The practical implication: cloud providers cannot offer Redis 7.4+ as a managed service under a standard arrangement, and organizations with open source compliance requirements cannot use it.
The Linux Foundation and former Redis contributors immediately forked Redis 7.2 as Valkey. The first stable release, Valkey 7.2.5, was available within weeks. Valkey 8.0 followed in late 2024 with performance improvements and new data structure enhancements. Valkey is licensed under Apache 2.0.
AWS launched ElastiCache for Valkey in November 2024. AWS also maintains ElastiCache for Redis (capped at 7.1, the last Apache-licensed version) and Amazon MemoryDB for Redis (also on 7.x). All three are available today.
Migration Path: Redis → Valkey
Valkey 8.0 is wire-protocol compatible with Redis 7.2. The RESP3 protocol works identically. All standard Redis commands (GET, SET, HSET, ZADD, LPUSH, XADD, etc.) work unchanged. Lua scripting, Redis modules (with LGPL compatibility), and pub/sub all work.
Client libraries do not require changes:
- Node.js:
ioredisandnode-rediswork with Valkey without modification - Go:
go-redis/redisworks with Valkey without modification - Python:
redis-pyworks with Valkey without modification - PHP:
predisandphprediswork with Valkey without modification
For ElastiCache migration in Terraform, change the engine parameter:
# Before (Redis)
resource "aws_elasticache_replication_group" "cache" {
engine = "redis"
engine_version = "7.1"
# ...
}
# After (Valkey — drop-in replacement)
resource "aws_elasticache_replication_group" "cache" {
engine = "valkey"
engine_version = "8.0"
# ...
}In-place upgrade from ElastiCache Redis to Valkey is available via the AWS console or CLI (modify-replication-group --engine valkey). The upgrade involves a rolling restart with no downtime on multi-AZ clusters.
Cache Patterns: Understanding the Cost of Each
Cache-Aside (Lazy Loading)
The most common pattern. Application checks cache first, fetches from database on miss, writes to cache.
Cache HIT: 1 Redis GET → return data (sub-millisecond)
Cache MISS: 1 Redis GET + 1 DB read + 1 Redis SET → return data (~5-50ms)Cost: On cache miss, you pay for 2 round trips (Redis + DB) vs 1 (DB direct). For a cache hit rate of 90%, average latency is: 0.9 × 0.5ms + 0.1 × (0.5ms + 20ms) = 0.45 + 2.05 = 2.5ms. The cost-saving mechanism is that DB reads are more expensive than Redis reads — RDS read I/O or DynamoDB read units add up; Redis reads are included in a flat monthly ElastiCache fee.
When to use: Read-heavy workloads where cache hit rate exceeds ~70%, and where serving slightly stale data is acceptable. Node.js implementation:
// cache-aside.js
const Redis = require('ioredis')
const redis = new Redis({
host: process.env.ELASTICACHE_ENDPOINT,
port: 6379,
retryStrategy: (times) => Math.min(times * 50, 2000),
enableOfflineQueue: false // Fail fast if Redis is down — don't queue requests
})
async function getUserProfile(userId) {
const cacheKey = `user:profile:${userId}`
const ttlSeconds = 300 + Math.floor(Math.random() * 60) // 300-360s TTL jitter
// 1. Check cache
const cached = await redis.get(cacheKey)
if (cached !== null) {
return JSON.parse(cached)
}
// 2. Cache miss: fetch from DB
const user = await db.users.findById(userId)
if (!user) {
return null
}
// 3. Write to cache with TTL
await redis.set(cacheKey, JSON.stringify(user), 'EX', ttlSeconds)
return user
}Write-Through
Application writes to cache and database simultaneously. Cache is always consistent with the database.
Cost: Every write pays for both a DB write AND a Redis write. For write-heavy workloads, this doubles write I/O. Write-through is appropriate when reads are much more frequent than writes and cache consistency is critical.
Write amplification cost example: A user updates their profile (1 DB write). With write-through, you also write to Redis (1 Redis write). If DynamoDB charges $1.25/million writes and ElastiCache is flat-rate, this write amplification is nearly free for low-write workloads. But for write-heavy workloads (>1 million writes/day), the duplicate work adds CPU overhead on ElastiCache.
Write-Behind (Write-Back)
Application writes to cache first, then asynchronously to the database. Lowest write latency, highest data loss risk.
Redis does not natively support write-behind — you implement it by writing to Redis, then using a background process to flush to the database. This pattern is only appropriate when:
- Acknowledgment latency matters (gaming leaderboards, real-time counters)
- Data loss of the last few seconds is acceptable
- You have a reliable background process with dead-letter handling for flush failures
For most AWS workloads, the DynamoDB write cost savings from write-behind do not justify the data loss risk. Stick with cache-aside or write-through.
Rate Limiting: Three Implementations with Cost Comparison
Rate limiting with Redis is cheaper than API Gateway usage plans when you have more than ~100 unique rate-limit subjects (users, IPs, API keys) or when you need rate limiting outside the HTTP layer.
Fixed Window Counter (Simplest, Lowest Latency)
One Redis command per request. Fast, but allows burst at window boundary (2x limit in 2 seconds spanning window boundary).
// Node.js - fixed window rate limiter with ioredis
const Redis = require('ioredis')
const redis = new Redis({ host: process.env.ELASTICACHE_ENDPOINT })
async function fixedWindowRateLimit(identifier, limit, windowSeconds) {
const key = `ratelimit:fixed:${identifier}:${Math.floor(Date.now() / (windowSeconds * 1000))}`
const current = await redis.incr(key)
if (current === 1) {
// First request in this window: set expiry
await redis.expire(key, windowSeconds)
}
return {
allowed: current <= limit,
current,
limit,
resetAt: (Math.floor(Date.now() / (windowSeconds * 1000)) + 1) * windowSeconds * 1000
}
}
// Usage
app.use(async (req, res, next) => {
const result = await fixedWindowRateLimit(`user:${req.user.id}`, 100, 60) // 100 req/minute
res.set('X-RateLimit-Limit', result.limit)
res.set('X-RateLimit-Remaining', Math.max(0, result.limit - result.current))
res.set('X-RateLimit-Reset', result.resetAt)
if (!result.allowed) {
return res.status(429).json({ error: 'Rate limit exceeded' })
}
next()
})Sliding Window with Sorted Set (Most Accurate)
Uses a sorted set where score = timestamp. Accurately counts requests in the last N seconds without boundary burst issues. Two Redis commands per request (ZREMRANGEBYSCORE + ZADD + ZCARD in a pipeline).
// Node.js - sliding window rate limiter
async function slidingWindowRateLimit(identifier, limit, windowMs) {
const now = Date.now()
const windowStart = now - windowMs
const key = `ratelimit:sliding:${identifier}`
const pipeline = redis.pipeline()
pipeline.zremrangebyscore(key, '-inf', windowStart) // Remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`) // Add current request
pipeline.zcard(key) // Count requests in window
pipeline.pexpire(key, windowMs) // Reset TTL
const results = await pipeline.exec()
const count = results[2][1] // Result of ZCARD
return {
allowed: count <= limit,
current: count,
limit,
retryAfter: count > limit ? Math.ceil(windowMs / 1000) : 0
}
}Token Bucket with Lua Script (Atomic, Smoothest Rate Control)
Token bucket allows short bursts while enforcing average rate. Implemented as an atomic Lua script — no race conditions between check and update.
// Node.js - token bucket rate limiter (Lua for atomicity)
const tokenBucketScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3]) -- current timestamp in ms
local requested = tonumber(ARGV[4]) -- tokens requested (usually 1)
-- Get current state or initialize
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
-- Refill tokens based on elapsed time
local elapsed = (now - last_refill) / 1000 -- convert to seconds
local new_tokens = math.min(capacity, tokens + (elapsed * refill_rate))
-- Check if request can be fulfilled
if new_tokens >= requested then
new_tokens = new_tokens - requested
redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate) * 1000)
return {1, math.floor(new_tokens)} -- allowed, remaining tokens
else
redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate) * 1000)
return {0, math.floor(new_tokens)} -- denied, remaining tokens
end
`
async function tokenBucketRateLimit(identifier, capacity, refillRate) {
const key = `ratelimit:bucket:${identifier}`
const result = await redis.eval(
tokenBucketScript,
1, key, // 1 key, the key value
capacity, // bucket capacity
refillRate, // tokens per second
Date.now(), // current timestamp
1 // tokens requested
)
return {
allowed: result[0] === 1,
tokensRemaining: result[1],
capacity
}
}Go Fixed Window Rate Limiter
// Go - fixed window rate limiter with go-redis
package ratelimit
import (
"context"
"fmt"
"time"
"github.com/redis/go-redis/v9"
)
type FixedWindowLimiter struct {
client *redis.Client
limit int64
windowSeconds int64
}
func NewFixedWindowLimiter(client *redis.Client, limit int64, window time.Duration) *FixedWindowLimiter {
return &FixedWindowLimiter{
client: client,
limit: limit,
windowSeconds: int64(window.Seconds()),
}
}
type LimitResult struct {
Allowed bool
Current int64
Limit int64
ResetAt time.Time
}
func (l *FixedWindowLimiter) Allow(ctx context.Context, identifier string) (LimitResult, error) {
windowID := time.Now().Unix() / l.windowSeconds
key := fmt.Sprintf("ratelimit:fixed:%s:%d", identifier, windowID)
pipe := l.client.Pipeline()
incrCmd := pipe.Incr(ctx, key)
pipe.Expire(ctx, key, time.Duration(l.windowSeconds)*time.Second)
if _, err := pipe.Exec(ctx); err != nil {
// Fail open: if Redis is unavailable, allow the request
// (prevents Redis outage from taking down your API)
return LimitResult{Allowed: true, Limit: l.limit}, nil
}
current := incrCmd.Val()
resetAt := time.Unix((windowID+1)*l.windowSeconds, 0)
return LimitResult{
Allowed: current <= l.limit,
Current: current,
Limit: l.limit,
ResetAt: resetAt,
}, nil
}PHP Sliding Window with Predis
<?php
use Predis\Client;
class SlidingWindowRateLimiter
{
public function __construct(
private Client $redis,
private int $limit,
private int $windowMs
) {}
public function isAllowed(string $identifier): array
{
$now = (int)(microtime(true) * 1000);
$windowStart = $now - $this->windowMs;
$key = "ratelimit:sliding:{$identifier}";
$pipe = $this->redis->pipeline();
$pipe->zremrangebyscore($key, '-inf', $windowStart);
$pipe->zadd($key, [$now . '-' . uniqid() => $now]);
$pipe->zcard($key);
$pipe->pexpire($key, $this->windowMs);
$results = $pipe->execute();
$count = $results[2];
return [
'allowed' => $count <= $this->limit,
'current' => $count,
'limit' => $this->limit,
'retry_after' => $count > $this->limit ? ceil($this->windowMs / 1000) : 0,
];
}
}Python Sliding Window with redis-py
import time
import uuid
import redis
class SlidingWindowRateLimiter:
def __init__(self, client: redis.Redis, limit: int, window_seconds: int):
self.client = client
self.limit = limit
self.window_ms = window_seconds * 1000
def is_allowed(self, identifier: str) -> dict:
now_ms = int(time.time() * 1000)
window_start_ms = now_ms - self.window_ms
key = f"ratelimit:sliding:{identifier}"
pipe = self.client.pipeline()
pipe.zremrangebyscore(key, '-inf', window_start_ms)
pipe.zadd(key, {f"{now_ms}-{uuid.uuid4().hex}": now_ms})
pipe.zcard(key)
pipe.pexpire(key, self.window_ms)
results = pipe.execute()
count = results[2]
return {
"allowed": count <= self.limit,
"current": count,
"limit": self.limit,
"retry_after": max(0, (self.window_ms // 1000)) if count > self.limit else 0,
}Session Storage: DynamoDB vs ElastiCache Cost Analysis
The cost model is simple: DynamoDB charges per operation, ElastiCache charges per hour regardless of operations.
Cost Calculation at Scale
Assumptions for a SaaS application:
- 50,000 daily active users (DAU)
- Average 40 requests per session
- Each request reads session once (GET), writes on auth events (SET): ~2 reads, 0.05 writes per request
- Session TTL: 24 hours, JSON blob ~2 KB
DynamoDB On-Demand session costs:
Daily reads: 50,000 DAU × 40 requests × 2 reads = 4,000,000 reads/day
Daily writes: 50,000 DAU × 40 requests × 0.05 writes = 100,000 writes/day
DynamoDB cost:
Reads: 4,000,000 / 1,000,000 × $0.25 = $1.00/day
Writes: 100,000 / 1,000,000 × $1.25 = $0.125/day
Storage: 50,000 sessions × 2KB × 30 days = 3 GB × $0.25/GB = $0.75/month
Total: ~$33.75/monthElastiCache t4g.small session costs:
t4g.small: 2 vCPU, 1.37 GB RAM, $0.016/hr
Monthly: $0.016 × 730 = $11.68/month
Capacity check: 50,000 sessions × 2KB = 100 MB — easily fits in 1.37 GBSavings at 50,000 DAU: $33.75 - $11.68 = $22.07/month
At 500,000 DAU: DynamoDB ≈ $337/month vs ElastiCache t4g.medium ($0.032/hr = $23.36/month) = $314/month savings.
Session Implementation
A Node.js session with ElastiCache:
// express-session with ioredis store
const session = require('express-session')
const RedisStore = require('connect-redis').default
const { createClient } = require('redis')
const redisClient = createClient({
socket: {
host: process.env.ELASTICACHE_ENDPOINT,
port: 6379,
tls: true, // ElastiCache encryption in transit
rejectUnauthorized: false // ElastiCache uses self-signed cert
}
})
await redisClient.connect()
app.use(session({
store: new RedisStore({
client: redisClient,
prefix: 'session:',
ttl: 86400 // 24 hours in seconds
}),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false,
cookie: {
secure: true, // HTTPS only
httpOnly: true, // No JS access
maxAge: 86400 * 1000, // 24 hours in ms
sameSite: 'strict'
}
}))Redis as a Queue: When to Use vs SQS
Redis queues are appropriate for workloads requiring sub-millisecond enqueue/dequeue latency where SQS’s eventual consistency model and ~20ms minimum latency are too slow.
List-Based Simple Queue
// Producer: push to queue
await redis.lpush('jobs:email-send', JSON.stringify({
to: 'user@example.com',
template: 'welcome',
userId: '12345',
enqueuedAt: Date.now()
}))
// Consumer: blocking pop (waits up to 30s for a message)
async function processEmailQueue() {
while (true) {
const result = await redis.brpop('jobs:email-send', 30) // 30s timeout
if (result) {
const [_queue, message] = result
const job = JSON.parse(message)
await sendEmail(job)
}
}
}Limitation: BRPOP/LPUSH provides no message acknowledgment. If the consumer crashes after popping but before processing, the message is lost. For jobs where loss is unacceptable, use Redis Streams or SQS.
Redis Streams for Durable Queuing
Redis Streams (XADD/XREADGROUP) provide consumer groups, message acknowledgment, and pending message tracking — much closer to SQS’s semantics at Redis speed.
// Producer: append to stream
await redis.xadd(
'stream:orders',
'*', // Auto-generate message ID
'order_id', '9876',
'user_id', '12345',
'total', '99.99',
'status', 'pending'
)
// Consumer group setup (run once)
await redis.xgroup('CREATE', 'stream:orders', 'order-processors', '0', 'MKSTREAM')
// Consumer: read with acknowledgment
async function processOrderStream(consumerId) {
while (true) {
// Read up to 10 messages, block for 5 seconds if empty
const messages = await redis.xreadgroup(
'GROUP', 'order-processors', consumerId,
'COUNT', 10,
'BLOCK', 5000,
'STREAMS', 'stream:orders', '>' // '>' means undelivered messages only
)
if (!messages) {
continue // Timeout, loop again
}
for (const [_stream, entries] of messages) {
for (const [messageId, fields] of entries) {
const message = {}
for (let i = 0; i < fields.length; i += 2) {
message[fields[i]] = fields[i + 1]
}
try {
await processOrder(message)
// Acknowledge: removes from pending entries
await redis.xack('stream:orders', 'order-processors', messageId)
} catch (error) {
// Message stays in pending — will be redelivered on next XREADGROUP
console.error(`Failed to process ${messageId}:`, error)
}
}
}
}
}
// Check pending messages (unacknowledged, possibly stuck)
const pending = await redis.xpending(
'stream:orders',
'order-processors',
'-', '+', // min/max message IDs
10 // count
)Redis Queue vs SQS: When Each Wins
| Factor | Redis Streams | SQS |
|---|---|---|
| Latency | <1ms | ~20ms minimum |
| Durability | Memory + AOF/RDB persistence | Multi-AZ, 4-day retention default |
| Cost (at scale) | Flat ElastiCache rate | $0.40/million messages |
| Visibility timeout | Manual (TTL on claim) | Built-in, configurable |
| Dead letter queue | Manual implementation | Native DLQ support |
| FIFO ordering | Yes (stream ID order) | SQS FIFO (higher cost) |
| Ops burden | Managed (ElastiCache) | Fully managed (SQS) |
Use Redis Streams when: Your application already has Redis, latency matters (real-time notifications, gaming, live chat), and message volume is moderate (<1 million/day per stream).
Use SQS when: You need guaranteed durability, long message retention (up to 14 days), native DLQ support, or you are processing asynchronous background jobs where 20ms latency is irrelevant.
Distributed Locks: Preventing Duplicate Processing
Distributed locks prevent multiple instances of a service from processing the same resource concurrently. This avoids duplicate charges, double-sends, and data inconsistency.
SETNX Lock (Simple, Single Instance)
// Simple lock with SETNX (SET if Not eXists)
async function acquireLock(resourceId, ttlMs = 5000) {
const lockKey = `lock:${resourceId}`
const lockToken = `${Date.now()}-${Math.random()}` // Unique token to identify this lock
const acquired = await redis.set(
lockKey,
lockToken,
'PX', ttlMs, // Expiry in milliseconds
'NX' // Only set if key does not exist
)
return acquired ? lockToken : null // Return token if acquired, null if already locked
}
async function releaseLock(resourceId, lockToken) {
// Lua script: only release if we own the lock
// Prevents releasing a lock acquired by another process
const releaseLockScript = `
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('DEL', KEYS[1])
else
return 0
end
`
const lockKey = `lock:${resourceId}`
return redis.eval(releaseLockScript, 1, lockKey, lockToken)
}
// Usage: prevent duplicate invoice processing
async function processInvoice(invoiceId) {
const lockToken = await acquireLock(`invoice:${invoiceId}`, 30000)
if (!lockToken) {
console.log(`Invoice ${invoiceId} is being processed by another instance`)
return
}
try {
await chargeInvoice(invoiceId)
} finally {
await releaseLock(`invoice:${invoiceId}`, lockToken)
}
}Redlock for Multi-Node Safety
For systems where a single Redis node failure must not cause two processes to hold the lock simultaneously, use Redlock — acquire lock on majority of N Redis nodes.
// Redlock with 3 ElastiCache nodes (separate primary nodes)
const Redlock = require('redlock')
const Redis = require('ioredis')
const nodes = [
new Redis({ host: process.env.ELASTICACHE_NODE_1 }),
new Redis({ host: process.env.ELASTICACHE_NODE_2 }),
new Redis({ host: process.env.ELASTICACHE_NODE_3 })
]
const redlock = new Redlock(nodes, {
driftFactor: 0.01, // Assume 1% clock drift
retryCount: 3,
retryDelay: 200, // ms between retries
retryJitter: 100, // Random jitter on retry delay
automaticExtensionThreshold: 500 // Extend if lock held > (TTL - 500ms)
})
async function processWithRedlock(resourceId) {
const lock = await redlock.acquire([`lock:${resourceId}`], 10000) // 10s TTL
try {
await processResource(resourceId)
} finally {
await redlock.release(lock)
}
}Cost note: Redlock requires N independent Redis nodes (not replicas of the same primary). This means N ElastiCache clusters. For most applications, the simple SETNX approach with a single multi-AZ ElastiCache cluster is sufficient. Use Redlock only when split-brain lock safety is a strict requirement.
Memory Optimization: Getting More from Each ElastiCache Dollar
ElastiCache is billed by instance size. The difference between a t4g.medium ($0.032/hr = $23/month) and a r7g.large ($0.166/hr = $121/month) is $98/month. Optimizing memory usage keeps you on smaller instances longer.
Eviction Policy Selection
The eviction policy determines what happens when Redis reaches maxmemory:
allkeys-lru — Evict least recently used keys regardless of TTL
Best for: cache-only Redis where all data is expendable
Risk: important keys with far-future TTL can be evicted
volatile-lru — Evict LRU keys among those with TTL set
Best for: mixed cache + persistent data (sessions with TTL,
config without TTL — config is never evicted)
Risk: if all keys have TTL, behaves like allkeys-lru
allkeys-lfu — Evict least frequently used (Redis 4.0+)
Best for: workloads with irregular access patterns
noeviction — Return error when memory full
Best for: queues/streams where data loss is unacceptableFor a mixed Redis deployment (cache + sessions + rate limit counters):
volatile-lru is usually the right choice:
- Sessions have TTL → can be evicted if memory pressure requires
- Rate limit counters have short TTL → can be evicted
- Any permanent configuration keys have no TTL → never evictedSet the eviction policy in ElastiCache parameter group (see Terraform below).
HASH vs STRING for Object Storage
Storing an object as a Redis HASH rather than a JSON string can save 40–70% memory for small objects, because Redis uses a compact ziplist encoding for HASHes with fewer than 128 fields and values under 64 bytes.
// STRING: stores full JSON blob
await redis.set('user:123', JSON.stringify({
id: 123,
name: 'Alice',
email: 'alice@example.com',
plan: 'pro',
created_at: '2026-01-01'
}))
// Memory: ~100 bytes (JSON overhead + Redis key overhead + string encoding)
// HASH: Redis uses compact ziplist encoding for small hashes
await redis.hset('user:123', {
id: '123',
name: 'Alice',
email: 'alice@example.com',
plan: 'pro',
created_at: '2026-01-01'
})
// Memory: ~60 bytes (ziplist encoding, 40% savings)
// Read specific field without fetching full object
const plan = await redis.hget('user:123', 'plan')
// Read multiple fields
const [name, email] = await redis.hmget('user:123', 'name', 'email')
// Read all fields
const user = await redis.hgetall('user:123')Check encoding to verify ziplist is being used:
# Redis CLI memory analysis
redis-cli -h $ELASTICACHE_ENDPOINT
# Check encoding of a specific key
OBJECT ENCODING user:123
# Should return: "ziplist" or "listpack" (Redis 7.0+) for small hashes
# Returns: "hashtable" if hash exceeds hash-max-listpack-entries (default 128)
# Detailed memory usage
DEBUG OBJECT user:123
# Returns: serializedlength, encoding, type
# Memory usage of a key (in bytes)
MEMORY USAGE user:123
# Overall memory statistics
INFO memory
# Key metrics:
# used_memory: total allocated memory
# used_memory_rss: RSS from OS perspective (includes fragmentation)
# mem_fragmentation_ratio: used_memory_rss / used_memory (should be 1.0-1.5)
# maxmemory: configured maximum
# maxmemory_human: human-readable maximumPreventing Eviction Storms
When Redis hits maxmemory, it evicts keys according to the policy. If eviction is slow (many keys to scan), request latency spikes. To prevent eviction storms:
- Set
maxmemoryto 80% of instance RAM (leave 20% headroom for overhead and fragmentation). - Monitor
evicted_keysrate in CloudWatch — a sudden spike indicates memory pressure. - Use
MEMORY USAGEto identify oversized keys consuming disproportionate memory.
# Find the 10 largest keys (expensive on large datasets — run during maintenance)
redis-cli -h $ENDPOINT --bigkeys
# Better for production: scan with MEMORY USAGE sampling
redis-cli -h $ENDPOINT --scan --pattern '*' | head -1000 | while read key; do
size=$(redis-cli -h $ENDPOINT MEMORY USAGE "$key" 2>/dev/null || echo 0)
echo "$size $key"
done | sort -rn | head -20Cache Stampede Prevention in Detail
Cache stampede is the most dangerous failure mode in a Redis-backed system. It can cascade: cache expires → 500 simultaneous DB queries → DB CPU spikes to 100% → query timeout → all 500 requests return error → retry storm → DB crash.
Mutex Lock Approach
// Mutex-based cache: only one request regenerates, others wait
const LOCK_TTL = 5000 // 5 seconds max for cache rebuild
const STALE_TTL = 30 // Serve stale for 30 seconds while regenerating
async function getWithMutex(cacheKey, fetchFn, ttl) {
// 1. Check cache
const cached = await redis.get(cacheKey)
if (cached !== null) {
return JSON.parse(cached)
}
// 2. Cache miss: try to acquire rebuild lock
const lockKey = `lock:rebuild:${cacheKey}`
const lockToken = `${Date.now()}-${Math.random()}`
const acquired = await redis.set(lockKey, lockToken, 'PX', LOCK_TTL, 'NX')
if (acquired) {
// 3. We hold the lock: rebuild cache
try {
const data = await fetchFn()
await redis.set(cacheKey, JSON.stringify(data), 'EX', ttl)
return data
} finally {
// Release lock
const releaseScript = `
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('DEL', KEYS[1])
end
return 0
`
await redis.eval(releaseScript, 1, lockKey, lockToken)
}
} else {
// 4. Another process is rebuilding: wait briefly, then retry
await new Promise(resolve => setTimeout(resolve, 100))
const retried = await redis.get(cacheKey)
if (retried !== null) {
return JSON.parse(retried)
}
// If still missing after wait, fall through to DB
return fetchFn()
}
}Probabilistic Early Recomputation (PER)
PER proactively refreshes a cache entry before it expires, with probability increasing as expiry approaches. No locking required.
// Probabilistic Early Recomputation
// beta controls how aggressively to early-refresh (higher = more eager, default 1.0)
async function getWithPER(cacheKey, fetchFn, ttl, beta = 1.0) {
const cacheData = await redis.get(cacheKey)
if (cacheData !== null) {
const { value, delta, expiry } = JSON.parse(cacheData)
const now = Date.now() / 1000
// Probability formula: expire early if random < beta * delta * log(random)
// delta = time to compute the value (estimate)
const shouldRefresh = now - delta * beta * Math.log(Math.random()) >= expiry
if (!shouldRefresh) {
return value
}
// Fall through to refresh (early recomputation)
}
// Cache miss or PER triggered: recompute
const startTime = Date.now()
const value = await fetchFn()
const delta = (Date.now() - startTime) / 1000 // Computation time in seconds
const expiry = Date.now() / 1000 + ttl
await redis.set(
cacheKey,
JSON.stringify({ value, delta, expiry }),
'EX',
ttl + Math.floor(delta * beta * 2) // Extend TTL slightly for PER window
)
return value
}ElastiCache Terraform Configuration
A production-grade ElastiCache Valkey cluster with cluster mode enabled for horizontal scaling:
# elasticache.tf
resource "aws_elasticache_parameter_group" "valkey8_production" {
family = "valkey8"
name = "valkey8-production"
# Eviction policy: evict LRU keys with TTL set (volatile-lru)
# Protects permanent keys (config, feature flags) from eviction
parameter {
name = "maxmemory-policy"
value = "volatile-lru"
}
# Lazy freeing: delete expired keys asynchronously (lower latency)
parameter {
name = "lazyfree-lazy-expire"
value = "yes"
}
parameter {
name = "lazyfree-lazy-eviction"
value = "yes"
}
# Enable keyspace notifications for expiry events
# Useful for TTL-based workflows (e.g., session expiry cleanup hooks)
# K = keyspace events, E = keyevent events, x = expired events
parameter {
name = "notify-keyspace-events"
value = "Ex"
}
# Slowlog: log commands slower than 100ms
parameter {
name = "slowlog-log-slower-than"
value = "100000" # microseconds
}
parameter {
name = "slowlog-max-len"
value = "128"
}
# Hash optimization: use ziplist (listpack in Valkey 8) for small hashes
parameter {
name = "hash-max-listpack-entries"
value = "128"
}
parameter {
name = "hash-max-listpack-value"
value = "64"
}
}
resource "aws_elasticache_replication_group" "valkey_cache" {
replication_group_id = "myapp-valkey-cache"
description = "Valkey cluster for caching, sessions, and rate limiting"
engine = "valkey"
engine_version = "8.0"
node_type = "cache.t4g.medium" # 2 vCPU, 3.09 GB RAM, $0.032/hr
# Multi-AZ with automatic failover
multi_az_enabled = true
automatic_failover_enabled = true
# Number of shards (num_cache_clusters = primary + replicas per shard)
num_cache_clusters = 2 # 1 primary + 1 replica
parameter_group_name = aws_elasticache_parameter_group.valkey8_production.name
# Encryption
at_rest_encryption_enabled = true
transit_encryption_enabled = true
kms_key_id = aws_kms_key.elasticache.arn
# Maintenance and backup
maintenance_window = "sun:03:00-sun:04:00"
snapshot_window = "02:00-03:00"
snapshot_retention_limit = 7
# Auth token (password) for access control
auth_token = var.elasticache_auth_token
subnet_group_name = aws_elasticache_subnet_group.private.name
security_group_ids = [aws_security_group.elasticache.id]
# Apply changes immediately in non-production; use false in production
apply_immediately = false
log_delivery_configuration {
destination = aws_cloudwatch_log_group.elasticache_slow_logs.name
destination_type = "cloudwatch-logs"
log_format = "json"
log_type = "slow-log"
}
log_delivery_configuration {
destination = aws_cloudwatch_log_group.elasticache_engine_logs.name
destination_type = "cloudwatch-logs"
log_format = "json"
log_type = "engine-log"
}
tags = {
Environment = "production"
Team = "platform"
CostCenter = "infrastructure"
}
}
resource "aws_elasticache_subnet_group" "private" {
name = "myapp-elasticache-private"
subnet_ids = var.private_subnet_ids
}
resource "aws_security_group" "elasticache" {
name_prefix = "elasticache-"
vpc_id = var.vpc_id
ingress {
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = [var.app_security_group_id] # Only allow app tier
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "elasticache-sg"
}
}
# CloudWatch alarms for cache health
resource "aws_cloudwatch_metric_alarm" "cache_evictions" {
alarm_name = "elasticache-high-evictions"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "5"
metric_name = "Evictions"
namespace = "AWS/ElastiCache"
period = "60"
statistic = "Sum"
threshold = "100" # Alert if >100 evictions/minute sustained
alarm_description = "Cache is evicting keys — memory pressure or TTL storm"
dimensions = {
ReplicationGroupId = aws_elasticache_replication_group.valkey_cache.id
}
alarm_actions = [var.sns_alert_topic_arn]
}
resource "aws_cloudwatch_metric_alarm" "cache_memory_high" {
alarm_name = "elasticache-memory-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "3"
metric_name = "DatabaseMemoryUsagePercentage"
namespace = "AWS/ElastiCache"
period = "300"
statistic = "Average"
threshold = "80" # Alert at 80% memory to allow time to resize
alarm_description = "ElastiCache memory usage above 80% — consider scaling up"
dimensions = {
ReplicationGroupId = aws_elasticache_replication_group.valkey_cache.id
}
alarm_actions = [var.sns_alert_topic_arn]
}
output "cache_primary_endpoint" {
value = aws_elasticache_replication_group.valkey_cache.primary_endpoint_address
}
output "cache_reader_endpoint" {
value = aws_elasticache_replication_group.valkey_cache.reader_endpoint_address
}Putting It Together: Total Cost Impact
For a SaaS application at 100,000 DAU with ElastiCache t4g.medium ($23/month):
| Use Case | Alternative | Alternative Cost | Redis Cost | Monthly Savings |
|---|---|---|---|---|
| Session storage | DynamoDB | ~$135/month | Shared ElastiCache | ~$135 |
| Rate limiting (10k users) | API Gateway usage plans | ~$900/month | Shared ElastiCache | ~$900 |
| Simple queue | SQS (1M msgs/day) | ~$12/month | Shared ElastiCache | ~$12 |
| Cache (DB read offset 60%) | Additional RDS reads | ~$45/month | Shared ElastiCache | ~$45 |
| Distributed locks | DynamoDB conditional writes | ~$8/month | Shared ElastiCache | ~$8 |
| ElastiCache cost | $23/month | |||
| Net savings | ~$1,077/month |
A single ElastiCache t4g.medium serving all these workloads simultaneously delivers over $1,000/month in savings over managed-service alternatives at 100,000 DAU scale.
For a deeper look at caching patterns specifically for production environments including TTL strategies and invalidation, see our ElastiCache Redis caching strategies guide. For workloads where SQS is the right choice over Redis Streams, our SQS reliable messaging patterns guide covers dead letter queues, visibility timeouts, and FIFO ordering in depth. The full cross-service cost optimization framework is in the AWS cost control architecture playbook.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.


