AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Redis and its fork Valkey reduce AWS costs beyond caching: rate limiting, session storage, and distributed coordination all have cheaper implementations via in-memory data structures than the AWS-managed alternatives. Here is how to use them.

How to Use Redis and Valkey as a Cost-Saving Layer (Not Just Cache)

Cloud Architecture Palaniappan P 23 min read

Quick summary: Redis and its fork Valkey reduce AWS costs beyond caching: rate limiting, session storage, and distributed coordination all have cheaper implementations via in-memory data structures than the AWS-managed alternatives. Here is how to use them.

How to Use Redis and Valkey as a Cost-Saving Layer (Not Just Cache)
Table of Contents

Most teams deploy Redis as a cache and nothing else. They add it to reduce database reads, see a performance improvement, and leave it there. What they miss is that Redis — or Valkey, its Apache-licensed successor — can replace half a dozen other AWS services at a fraction of the cost, often with better performance.

This guide covers the cost math for replacing DynamoDB sessions, SQS for simple queues, API Gateway throttling for rate limiting, and coordination via distributed locks. Then it covers the operational details that prevent those cost savings from disappearing in incidents.


Redis vs Valkey in 2026: What Changed and What Matters

The License Fork

In March 2024, Redis Ltd. relicensed Redis 7.4+ under two non-open-source licenses: RSALv2 (Redis Source Available License v2) and SSPLv1 (Server Side Public License v1). Neither license is approved by the Open Source Initiative. The practical implication: cloud providers cannot offer Redis 7.4+ as a managed service under a standard arrangement, and organizations with open source compliance requirements cannot use it.

The Linux Foundation and former Redis contributors immediately forked Redis 7.2 as Valkey. The first stable release, Valkey 7.2.5, was available within weeks. Valkey 8.0 followed in late 2024 with performance improvements and new data structure enhancements. Valkey is licensed under Apache 2.0.

AWS launched ElastiCache for Valkey in November 2024. AWS also maintains ElastiCache for Redis (capped at 7.1, the last Apache-licensed version) and Amazon MemoryDB for Redis (also on 7.x). All three are available today.

Migration Path: Redis → Valkey

Valkey 8.0 is wire-protocol compatible with Redis 7.2. The RESP3 protocol works identically. All standard Redis commands (GET, SET, HSET, ZADD, LPUSH, XADD, etc.) work unchanged. Lua scripting, Redis modules (with LGPL compatibility), and pub/sub all work.

Client libraries do not require changes:

  • Node.js: ioredis and node-redis work with Valkey without modification
  • Go: go-redis/redis works with Valkey without modification
  • Python: redis-py works with Valkey without modification
  • PHP: predis and phpredis work with Valkey without modification

For ElastiCache migration in Terraform, change the engine parameter:

# Before (Redis)
resource "aws_elasticache_replication_group" "cache" {
  engine         = "redis"
  engine_version = "7.1"
  # ...
}

# After (Valkey — drop-in replacement)
resource "aws_elasticache_replication_group" "cache" {
  engine         = "valkey"
  engine_version = "8.0"
  # ...
}

In-place upgrade from ElastiCache Redis to Valkey is available via the AWS console or CLI (modify-replication-group --engine valkey). The upgrade involves a rolling restart with no downtime on multi-AZ clusters.


Cache Patterns: Understanding the Cost of Each

Cache-Aside (Lazy Loading)

The most common pattern. Application checks cache first, fetches from database on miss, writes to cache.

Cache HIT:  1 Redis GET → return data (sub-millisecond)
Cache MISS: 1 Redis GET + 1 DB read + 1 Redis SET → return data (~5-50ms)

Cost: On cache miss, you pay for 2 round trips (Redis + DB) vs 1 (DB direct). For a cache hit rate of 90%, average latency is: 0.9 × 0.5ms + 0.1 × (0.5ms + 20ms) = 0.45 + 2.05 = 2.5ms. The cost-saving mechanism is that DB reads are more expensive than Redis reads — RDS read I/O or DynamoDB read units add up; Redis reads are included in a flat monthly ElastiCache fee.

When to use: Read-heavy workloads where cache hit rate exceeds ~70%, and where serving slightly stale data is acceptable. Node.js implementation:

// cache-aside.js
const Redis = require('ioredis')
const redis = new Redis({
  host: process.env.ELASTICACHE_ENDPOINT,
  port: 6379,
  retryStrategy: (times) => Math.min(times * 50, 2000),
  enableOfflineQueue: false  // Fail fast if Redis is down — don't queue requests
})

async function getUserProfile(userId) {
  const cacheKey = `user:profile:${userId}`
  const ttlSeconds = 300 + Math.floor(Math.random() * 60)  // 300-360s TTL jitter

  // 1. Check cache
  const cached = await redis.get(cacheKey)
  if (cached !== null) {
    return JSON.parse(cached)
  }

  // 2. Cache miss: fetch from DB
  const user = await db.users.findById(userId)
  if (!user) {
    return null
  }

  // 3. Write to cache with TTL
  await redis.set(cacheKey, JSON.stringify(user), 'EX', ttlSeconds)

  return user
}

Write-Through

Application writes to cache and database simultaneously. Cache is always consistent with the database.

Cost: Every write pays for both a DB write AND a Redis write. For write-heavy workloads, this doubles write I/O. Write-through is appropriate when reads are much more frequent than writes and cache consistency is critical.

Write amplification cost example: A user updates their profile (1 DB write). With write-through, you also write to Redis (1 Redis write). If DynamoDB charges $1.25/million writes and ElastiCache is flat-rate, this write amplification is nearly free for low-write workloads. But for write-heavy workloads (>1 million writes/day), the duplicate work adds CPU overhead on ElastiCache.

Write-Behind (Write-Back)

Application writes to cache first, then asynchronously to the database. Lowest write latency, highest data loss risk.

Redis does not natively support write-behind — you implement it by writing to Redis, then using a background process to flush to the database. This pattern is only appropriate when:

  • Acknowledgment latency matters (gaming leaderboards, real-time counters)
  • Data loss of the last few seconds is acceptable
  • You have a reliable background process with dead-letter handling for flush failures

For most AWS workloads, the DynamoDB write cost savings from write-behind do not justify the data loss risk. Stick with cache-aside or write-through.


Rate Limiting: Three Implementations with Cost Comparison

Rate limiting with Redis is cheaper than API Gateway usage plans when you have more than ~100 unique rate-limit subjects (users, IPs, API keys) or when you need rate limiting outside the HTTP layer.

Fixed Window Counter (Simplest, Lowest Latency)

One Redis command per request. Fast, but allows burst at window boundary (2x limit in 2 seconds spanning window boundary).

// Node.js - fixed window rate limiter with ioredis
const Redis = require('ioredis')
const redis = new Redis({ host: process.env.ELASTICACHE_ENDPOINT })

async function fixedWindowRateLimit(identifier, limit, windowSeconds) {
  const key = `ratelimit:fixed:${identifier}:${Math.floor(Date.now() / (windowSeconds * 1000))}`

  const current = await redis.incr(key)

  if (current === 1) {
    // First request in this window: set expiry
    await redis.expire(key, windowSeconds)
  }

  return {
    allowed: current <= limit,
    current,
    limit,
    resetAt: (Math.floor(Date.now() / (windowSeconds * 1000)) + 1) * windowSeconds * 1000
  }
}

// Usage
app.use(async (req, res, next) => {
  const result = await fixedWindowRateLimit(`user:${req.user.id}`, 100, 60) // 100 req/minute
  res.set('X-RateLimit-Limit', result.limit)
  res.set('X-RateLimit-Remaining', Math.max(0, result.limit - result.current))
  res.set('X-RateLimit-Reset', result.resetAt)

  if (!result.allowed) {
    return res.status(429).json({ error: 'Rate limit exceeded' })
  }
  next()
})

Sliding Window with Sorted Set (Most Accurate)

Uses a sorted set where score = timestamp. Accurately counts requests in the last N seconds without boundary burst issues. Two Redis commands per request (ZREMRANGEBYSCORE + ZADD + ZCARD in a pipeline).

// Node.js - sliding window rate limiter
async function slidingWindowRateLimit(identifier, limit, windowMs) {
  const now = Date.now()
  const windowStart = now - windowMs
  const key = `ratelimit:sliding:${identifier}`

  const pipeline = redis.pipeline()
  pipeline.zremrangebyscore(key, '-inf', windowStart)   // Remove old entries
  pipeline.zadd(key, now, `${now}-${Math.random()}`)    // Add current request
  pipeline.zcard(key)                                    // Count requests in window
  pipeline.pexpire(key, windowMs)                        // Reset TTL

  const results = await pipeline.exec()
  const count = results[2][1]  // Result of ZCARD

  return {
    allowed: count <= limit,
    current: count,
    limit,
    retryAfter: count > limit ? Math.ceil(windowMs / 1000) : 0
  }
}

Token Bucket with Lua Script (Atomic, Smoothest Rate Control)

Token bucket allows short bursts while enforcing average rate. Implemented as an atomic Lua script — no race conditions between check and update.

// Node.js - token bucket rate limiter (Lua for atomicity)
const tokenBucketScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])   -- tokens per second
local now = tonumber(ARGV[3])           -- current timestamp in ms
local requested = tonumber(ARGV[4])    -- tokens requested (usually 1)

-- Get current state or initialize
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now

-- Refill tokens based on elapsed time
local elapsed = (now - last_refill) / 1000  -- convert to seconds
local new_tokens = math.min(capacity, tokens + (elapsed * refill_rate))

-- Check if request can be fulfilled
if new_tokens >= requested then
  new_tokens = new_tokens - requested
  redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
  redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate) * 1000)
  return {1, math.floor(new_tokens)}  -- allowed, remaining tokens
else
  redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
  redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate) * 1000)
  return {0, math.floor(new_tokens)}  -- denied, remaining tokens
end
`

async function tokenBucketRateLimit(identifier, capacity, refillRate) {
  const key = `ratelimit:bucket:${identifier}`
  const result = await redis.eval(
    tokenBucketScript,
    1, key,           // 1 key, the key value
    capacity,         // bucket capacity
    refillRate,       // tokens per second
    Date.now(),       // current timestamp
    1                 // tokens requested
  )
  return {
    allowed: result[0] === 1,
    tokensRemaining: result[1],
    capacity
  }
}

Go Fixed Window Rate Limiter

// Go - fixed window rate limiter with go-redis
package ratelimit

import (
    "context"
    "fmt"
    "time"

    "github.com/redis/go-redis/v9"
)

type FixedWindowLimiter struct {
    client        *redis.Client
    limit         int64
    windowSeconds int64
}

func NewFixedWindowLimiter(client *redis.Client, limit int64, window time.Duration) *FixedWindowLimiter {
    return &FixedWindowLimiter{
        client:        client,
        limit:         limit,
        windowSeconds: int64(window.Seconds()),
    }
}

type LimitResult struct {
    Allowed   bool
    Current   int64
    Limit     int64
    ResetAt   time.Time
}

func (l *FixedWindowLimiter) Allow(ctx context.Context, identifier string) (LimitResult, error) {
    windowID := time.Now().Unix() / l.windowSeconds
    key := fmt.Sprintf("ratelimit:fixed:%s:%d", identifier, windowID)

    pipe := l.client.Pipeline()
    incrCmd := pipe.Incr(ctx, key)
    pipe.Expire(ctx, key, time.Duration(l.windowSeconds)*time.Second)

    if _, err := pipe.Exec(ctx); err != nil {
        // Fail open: if Redis is unavailable, allow the request
        // (prevents Redis outage from taking down your API)
        return LimitResult{Allowed: true, Limit: l.limit}, nil
    }

    current := incrCmd.Val()
    resetAt := time.Unix((windowID+1)*l.windowSeconds, 0)

    return LimitResult{
        Allowed: current <= l.limit,
        Current: current,
        Limit:   l.limit,
        ResetAt: resetAt,
    }, nil
}

PHP Sliding Window with Predis

<?php

use Predis\Client;

class SlidingWindowRateLimiter
{
    public function __construct(
        private Client $redis,
        private int $limit,
        private int $windowMs
    ) {}

    public function isAllowed(string $identifier): array
    {
        $now = (int)(microtime(true) * 1000);
        $windowStart = $now - $this->windowMs;
        $key = "ratelimit:sliding:{$identifier}";

        $pipe = $this->redis->pipeline();
        $pipe->zremrangebyscore($key, '-inf', $windowStart);
        $pipe->zadd($key, [$now . '-' . uniqid() => $now]);
        $pipe->zcard($key);
        $pipe->pexpire($key, $this->windowMs);

        $results = $pipe->execute();
        $count = $results[2];

        return [
            'allowed'     => $count <= $this->limit,
            'current'     => $count,
            'limit'       => $this->limit,
            'retry_after' => $count > $this->limit ? ceil($this->windowMs / 1000) : 0,
        ];
    }
}

Python Sliding Window with redis-py

import time
import uuid
import redis

class SlidingWindowRateLimiter:
    def __init__(self, client: redis.Redis, limit: int, window_seconds: int):
        self.client = client
        self.limit = limit
        self.window_ms = window_seconds * 1000

    def is_allowed(self, identifier: str) -> dict:
        now_ms = int(time.time() * 1000)
        window_start_ms = now_ms - self.window_ms
        key = f"ratelimit:sliding:{identifier}"

        pipe = self.client.pipeline()
        pipe.zremrangebyscore(key, '-inf', window_start_ms)
        pipe.zadd(key, {f"{now_ms}-{uuid.uuid4().hex}": now_ms})
        pipe.zcard(key)
        pipe.pexpire(key, self.window_ms)
        results = pipe.execute()

        count = results[2]
        return {
            "allowed": count <= self.limit,
            "current": count,
            "limit": self.limit,
            "retry_after": max(0, (self.window_ms // 1000)) if count > self.limit else 0,
        }

Session Storage: DynamoDB vs ElastiCache Cost Analysis

The cost model is simple: DynamoDB charges per operation, ElastiCache charges per hour regardless of operations.

Cost Calculation at Scale

Assumptions for a SaaS application:

  • 50,000 daily active users (DAU)
  • Average 40 requests per session
  • Each request reads session once (GET), writes on auth events (SET): ~2 reads, 0.05 writes per request
  • Session TTL: 24 hours, JSON blob ~2 KB

DynamoDB On-Demand session costs:

Daily reads:  50,000 DAU × 40 requests × 2 reads = 4,000,000 reads/day
Daily writes: 50,000 DAU × 40 requests × 0.05 writes = 100,000 writes/day

DynamoDB cost:
  Reads:  4,000,000 / 1,000,000 × $0.25 = $1.00/day
  Writes: 100,000 / 1,000,000 × $1.25 = $0.125/day
  Storage: 50,000 sessions × 2KB × 30 days = 3 GB × $0.25/GB = $0.75/month
  Total: ~$33.75/month

ElastiCache t4g.small session costs:

t4g.small: 2 vCPU, 1.37 GB RAM, $0.016/hr
Monthly: $0.016 × 730 = $11.68/month

Capacity check: 50,000 sessions × 2KB = 100 MB — easily fits in 1.37 GB

Savings at 50,000 DAU: $33.75 - $11.68 = $22.07/month

At 500,000 DAU: DynamoDB ≈ $337/month vs ElastiCache t4g.medium ($0.032/hr = $23.36/month) = $314/month savings.

Session Implementation

A Node.js session with ElastiCache:

// express-session with ioredis store
const session = require('express-session')
const RedisStore = require('connect-redis').default
const { createClient } = require('redis')

const redisClient = createClient({
  socket: {
    host: process.env.ELASTICACHE_ENDPOINT,
    port: 6379,
    tls: true,  // ElastiCache encryption in transit
    rejectUnauthorized: false  // ElastiCache uses self-signed cert
  }
})

await redisClient.connect()

app.use(session({
  store: new RedisStore({
    client: redisClient,
    prefix: 'session:',
    ttl: 86400  // 24 hours in seconds
  }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false,
  cookie: {
    secure: true,   // HTTPS only
    httpOnly: true, // No JS access
    maxAge: 86400 * 1000,  // 24 hours in ms
    sameSite: 'strict'
  }
}))

Redis as a Queue: When to Use vs SQS

Redis queues are appropriate for workloads requiring sub-millisecond enqueue/dequeue latency where SQS’s eventual consistency model and ~20ms minimum latency are too slow.

List-Based Simple Queue

// Producer: push to queue
await redis.lpush('jobs:email-send', JSON.stringify({
  to: 'user@example.com',
  template: 'welcome',
  userId: '12345',
  enqueuedAt: Date.now()
}))

// Consumer: blocking pop (waits up to 30s for a message)
async function processEmailQueue() {
  while (true) {
    const result = await redis.brpop('jobs:email-send', 30)  // 30s timeout
    if (result) {
      const [_queue, message] = result
      const job = JSON.parse(message)
      await sendEmail(job)
    }
  }
}

Limitation: BRPOP/LPUSH provides no message acknowledgment. If the consumer crashes after popping but before processing, the message is lost. For jobs where loss is unacceptable, use Redis Streams or SQS.

Redis Streams for Durable Queuing

Redis Streams (XADD/XREADGROUP) provide consumer groups, message acknowledgment, and pending message tracking — much closer to SQS’s semantics at Redis speed.

// Producer: append to stream
await redis.xadd(
  'stream:orders',
  '*',  // Auto-generate message ID
  'order_id', '9876',
  'user_id', '12345',
  'total', '99.99',
  'status', 'pending'
)

// Consumer group setup (run once)
await redis.xgroup('CREATE', 'stream:orders', 'order-processors', '0', 'MKSTREAM')

// Consumer: read with acknowledgment
async function processOrderStream(consumerId) {
  while (true) {
    // Read up to 10 messages, block for 5 seconds if empty
    const messages = await redis.xreadgroup(
      'GROUP', 'order-processors', consumerId,
      'COUNT', 10,
      'BLOCK', 5000,
      'STREAMS', 'stream:orders', '>'  // '>' means undelivered messages only
    )

    if (!messages) {
      continue  // Timeout, loop again
    }

    for (const [_stream, entries] of messages) {
      for (const [messageId, fields] of entries) {
        const message = {}
        for (let i = 0; i < fields.length; i += 2) {
          message[fields[i]] = fields[i + 1]
        }

        try {
          await processOrder(message)
          // Acknowledge: removes from pending entries
          await redis.xack('stream:orders', 'order-processors', messageId)
        } catch (error) {
          // Message stays in pending — will be redelivered on next XREADGROUP
          console.error(`Failed to process ${messageId}:`, error)
        }
      }
    }
  }
}

// Check pending messages (unacknowledged, possibly stuck)
const pending = await redis.xpending(
  'stream:orders',
  'order-processors',
  '-', '+',  // min/max message IDs
  10         // count
)

Redis Queue vs SQS: When Each Wins

FactorRedis StreamsSQS
Latency<1ms~20ms minimum
DurabilityMemory + AOF/RDB persistenceMulti-AZ, 4-day retention default
Cost (at scale)Flat ElastiCache rate$0.40/million messages
Visibility timeoutManual (TTL on claim)Built-in, configurable
Dead letter queueManual implementationNative DLQ support
FIFO orderingYes (stream ID order)SQS FIFO (higher cost)
Ops burdenManaged (ElastiCache)Fully managed (SQS)

Use Redis Streams when: Your application already has Redis, latency matters (real-time notifications, gaming, live chat), and message volume is moderate (<1 million/day per stream).

Use SQS when: You need guaranteed durability, long message retention (up to 14 days), native DLQ support, or you are processing asynchronous background jobs where 20ms latency is irrelevant.


Distributed Locks: Preventing Duplicate Processing

Distributed locks prevent multiple instances of a service from processing the same resource concurrently. This avoids duplicate charges, double-sends, and data inconsistency.

SETNX Lock (Simple, Single Instance)

// Simple lock with SETNX (SET if Not eXists)
async function acquireLock(resourceId, ttlMs = 5000) {
  const lockKey = `lock:${resourceId}`
  const lockToken = `${Date.now()}-${Math.random()}`  // Unique token to identify this lock

  const acquired = await redis.set(
    lockKey,
    lockToken,
    'PX', ttlMs,  // Expiry in milliseconds
    'NX'          // Only set if key does not exist
  )

  return acquired ? lockToken : null  // Return token if acquired, null if already locked
}

async function releaseLock(resourceId, lockToken) {
  // Lua script: only release if we own the lock
  // Prevents releasing a lock acquired by another process
  const releaseLockScript = `
    if redis.call('GET', KEYS[1]) == ARGV[1] then
      return redis.call('DEL', KEYS[1])
    else
      return 0
    end
  `
  const lockKey = `lock:${resourceId}`
  return redis.eval(releaseLockScript, 1, lockKey, lockToken)
}

// Usage: prevent duplicate invoice processing
async function processInvoice(invoiceId) {
  const lockToken = await acquireLock(`invoice:${invoiceId}`, 30000)

  if (!lockToken) {
    console.log(`Invoice ${invoiceId} is being processed by another instance`)
    return
  }

  try {
    await chargeInvoice(invoiceId)
  } finally {
    await releaseLock(`invoice:${invoiceId}`, lockToken)
  }
}

Redlock for Multi-Node Safety

For systems where a single Redis node failure must not cause two processes to hold the lock simultaneously, use Redlock — acquire lock on majority of N Redis nodes.

// Redlock with 3 ElastiCache nodes (separate primary nodes)
const Redlock = require('redlock')
const Redis = require('ioredis')

const nodes = [
  new Redis({ host: process.env.ELASTICACHE_NODE_1 }),
  new Redis({ host: process.env.ELASTICACHE_NODE_2 }),
  new Redis({ host: process.env.ELASTICACHE_NODE_3 })
]

const redlock = new Redlock(nodes, {
  driftFactor: 0.01,  // Assume 1% clock drift
  retryCount: 3,
  retryDelay: 200,    // ms between retries
  retryJitter: 100,   // Random jitter on retry delay
  automaticExtensionThreshold: 500  // Extend if lock held > (TTL - 500ms)
})

async function processWithRedlock(resourceId) {
  const lock = await redlock.acquire([`lock:${resourceId}`], 10000)  // 10s TTL

  try {
    await processResource(resourceId)
  } finally {
    await redlock.release(lock)
  }
}

Cost note: Redlock requires N independent Redis nodes (not replicas of the same primary). This means N ElastiCache clusters. For most applications, the simple SETNX approach with a single multi-AZ ElastiCache cluster is sufficient. Use Redlock only when split-brain lock safety is a strict requirement.


Memory Optimization: Getting More from Each ElastiCache Dollar

ElastiCache is billed by instance size. The difference between a t4g.medium ($0.032/hr = $23/month) and a r7g.large ($0.166/hr = $121/month) is $98/month. Optimizing memory usage keeps you on smaller instances longer.

Eviction Policy Selection

The eviction policy determines what happens when Redis reaches maxmemory:

allkeys-lru     — Evict least recently used keys regardless of TTL
                  Best for: cache-only Redis where all data is expendable
                  Risk: important keys with far-future TTL can be evicted

volatile-lru    — Evict LRU keys among those with TTL set
                  Best for: mixed cache + persistent data (sessions with TTL,
                  config without TTL — config is never evicted)
                  Risk: if all keys have TTL, behaves like allkeys-lru

allkeys-lfu     — Evict least frequently used (Redis 4.0+)
                  Best for: workloads with irregular access patterns

noeviction      — Return error when memory full
                  Best for: queues/streams where data loss is unacceptable

For a mixed Redis deployment (cache + sessions + rate limit counters):

volatile-lru is usually the right choice:
- Sessions have TTL → can be evicted if memory pressure requires
- Rate limit counters have short TTL → can be evicted
- Any permanent configuration keys have no TTL → never evicted

Set the eviction policy in ElastiCache parameter group (see Terraform below).

HASH vs STRING for Object Storage

Storing an object as a Redis HASH rather than a JSON string can save 40–70% memory for small objects, because Redis uses a compact ziplist encoding for HASHes with fewer than 128 fields and values under 64 bytes.

// STRING: stores full JSON blob
await redis.set('user:123', JSON.stringify({
  id: 123,
  name: 'Alice',
  email: 'alice@example.com',
  plan: 'pro',
  created_at: '2026-01-01'
}))
// Memory: ~100 bytes (JSON overhead + Redis key overhead + string encoding)

// HASH: Redis uses compact ziplist encoding for small hashes
await redis.hset('user:123', {
  id: '123',
  name: 'Alice',
  email: 'alice@example.com',
  plan: 'pro',
  created_at: '2026-01-01'
})
// Memory: ~60 bytes (ziplist encoding, 40% savings)

// Read specific field without fetching full object
const plan = await redis.hget('user:123', 'plan')

// Read multiple fields
const [name, email] = await redis.hmget('user:123', 'name', 'email')

// Read all fields
const user = await redis.hgetall('user:123')

Check encoding to verify ziplist is being used:

# Redis CLI memory analysis
redis-cli -h $ELASTICACHE_ENDPOINT

# Check encoding of a specific key
OBJECT ENCODING user:123
# Should return: "ziplist" or "listpack" (Redis 7.0+) for small hashes
# Returns: "hashtable" if hash exceeds hash-max-listpack-entries (default 128)

# Detailed memory usage
DEBUG OBJECT user:123
# Returns: serializedlength, encoding, type

# Memory usage of a key (in bytes)
MEMORY USAGE user:123

# Overall memory statistics
INFO memory
# Key metrics:
# used_memory: total allocated memory
# used_memory_rss: RSS from OS perspective (includes fragmentation)
# mem_fragmentation_ratio: used_memory_rss / used_memory (should be 1.0-1.5)
# maxmemory: configured maximum
# maxmemory_human: human-readable maximum

Preventing Eviction Storms

When Redis hits maxmemory, it evicts keys according to the policy. If eviction is slow (many keys to scan), request latency spikes. To prevent eviction storms:

  1. Set maxmemory to 80% of instance RAM (leave 20% headroom for overhead and fragmentation).
  2. Monitor evicted_keys rate in CloudWatch — a sudden spike indicates memory pressure.
  3. Use MEMORY USAGE to identify oversized keys consuming disproportionate memory.
# Find the 10 largest keys (expensive on large datasets — run during maintenance)
redis-cli -h $ENDPOINT --bigkeys

# Better for production: scan with MEMORY USAGE sampling
redis-cli -h $ENDPOINT --scan --pattern '*' | head -1000 | while read key; do
  size=$(redis-cli -h $ENDPOINT MEMORY USAGE "$key" 2>/dev/null || echo 0)
  echo "$size $key"
done | sort -rn | head -20

Cache Stampede Prevention in Detail

Cache stampede is the most dangerous failure mode in a Redis-backed system. It can cascade: cache expires → 500 simultaneous DB queries → DB CPU spikes to 100% → query timeout → all 500 requests return error → retry storm → DB crash.

Mutex Lock Approach

// Mutex-based cache: only one request regenerates, others wait
const LOCK_TTL = 5000  // 5 seconds max for cache rebuild
const STALE_TTL = 30   // Serve stale for 30 seconds while regenerating

async function getWithMutex(cacheKey, fetchFn, ttl) {
  // 1. Check cache
  const cached = await redis.get(cacheKey)
  if (cached !== null) {
    return JSON.parse(cached)
  }

  // 2. Cache miss: try to acquire rebuild lock
  const lockKey = `lock:rebuild:${cacheKey}`
  const lockToken = `${Date.now()}-${Math.random()}`
  const acquired = await redis.set(lockKey, lockToken, 'PX', LOCK_TTL, 'NX')

  if (acquired) {
    // 3. We hold the lock: rebuild cache
    try {
      const data = await fetchFn()
      await redis.set(cacheKey, JSON.stringify(data), 'EX', ttl)
      return data
    } finally {
      // Release lock
      const releaseScript = `
        if redis.call('GET', KEYS[1]) == ARGV[1] then
          return redis.call('DEL', KEYS[1])
        end
        return 0
      `
      await redis.eval(releaseScript, 1, lockKey, lockToken)
    }
  } else {
    // 4. Another process is rebuilding: wait briefly, then retry
    await new Promise(resolve => setTimeout(resolve, 100))
    const retried = await redis.get(cacheKey)
    if (retried !== null) {
      return JSON.parse(retried)
    }
    // If still missing after wait, fall through to DB
    return fetchFn()
  }
}

Probabilistic Early Recomputation (PER)

PER proactively refreshes a cache entry before it expires, with probability increasing as expiry approaches. No locking required.

// Probabilistic Early Recomputation
// beta controls how aggressively to early-refresh (higher = more eager, default 1.0)
async function getWithPER(cacheKey, fetchFn, ttl, beta = 1.0) {
  const cacheData = await redis.get(cacheKey)

  if (cacheData !== null) {
    const { value, delta, expiry } = JSON.parse(cacheData)
    const now = Date.now() / 1000

    // Probability formula: expire early if random < beta * delta * log(random)
    // delta = time to compute the value (estimate)
    const shouldRefresh = now - delta * beta * Math.log(Math.random()) >= expiry

    if (!shouldRefresh) {
      return value
    }
    // Fall through to refresh (early recomputation)
  }

  // Cache miss or PER triggered: recompute
  const startTime = Date.now()
  const value = await fetchFn()
  const delta = (Date.now() - startTime) / 1000  // Computation time in seconds
  const expiry = Date.now() / 1000 + ttl

  await redis.set(
    cacheKey,
    JSON.stringify({ value, delta, expiry }),
    'EX',
    ttl + Math.floor(delta * beta * 2)  // Extend TTL slightly for PER window
  )

  return value
}

ElastiCache Terraform Configuration

A production-grade ElastiCache Valkey cluster with cluster mode enabled for horizontal scaling:

# elasticache.tf

resource "aws_elasticache_parameter_group" "valkey8_production" {
  family = "valkey8"
  name   = "valkey8-production"

  # Eviction policy: evict LRU keys with TTL set (volatile-lru)
  # Protects permanent keys (config, feature flags) from eviction
  parameter {
    name  = "maxmemory-policy"
    value = "volatile-lru"
  }

  # Lazy freeing: delete expired keys asynchronously (lower latency)
  parameter {
    name  = "lazyfree-lazy-expire"
    value = "yes"
  }

  parameter {
    name  = "lazyfree-lazy-eviction"
    value = "yes"
  }

  # Enable keyspace notifications for expiry events
  # Useful for TTL-based workflows (e.g., session expiry cleanup hooks)
  # K = keyspace events, E = keyevent events, x = expired events
  parameter {
    name  = "notify-keyspace-events"
    value = "Ex"
  }

  # Slowlog: log commands slower than 100ms
  parameter {
    name  = "slowlog-log-slower-than"
    value = "100000"  # microseconds
  }

  parameter {
    name  = "slowlog-max-len"
    value = "128"
  }

  # Hash optimization: use ziplist (listpack in Valkey 8) for small hashes
  parameter {
    name  = "hash-max-listpack-entries"
    value = "128"
  }

  parameter {
    name  = "hash-max-listpack-value"
    value = "64"
  }
}

resource "aws_elasticache_replication_group" "valkey_cache" {
  replication_group_id = "myapp-valkey-cache"
  description          = "Valkey cluster for caching, sessions, and rate limiting"

  engine         = "valkey"
  engine_version = "8.0"
  node_type      = "cache.t4g.medium"  # 2 vCPU, 3.09 GB RAM, $0.032/hr

  # Multi-AZ with automatic failover
  multi_az_enabled           = true
  automatic_failover_enabled = true

  # Number of shards (num_cache_clusters = primary + replicas per shard)
  num_cache_clusters = 2  # 1 primary + 1 replica

  parameter_group_name = aws_elasticache_parameter_group.valkey8_production.name

  # Encryption
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  kms_key_id                 = aws_kms_key.elasticache.arn

  # Maintenance and backup
  maintenance_window       = "sun:03:00-sun:04:00"
  snapshot_window          = "02:00-03:00"
  snapshot_retention_limit = 7

  # Auth token (password) for access control
  auth_token = var.elasticache_auth_token

  subnet_group_name  = aws_elasticache_subnet_group.private.name
  security_group_ids = [aws_security_group.elasticache.id]

  # Apply changes immediately in non-production; use false in production
  apply_immediately = false

  log_delivery_configuration {
    destination      = aws_cloudwatch_log_group.elasticache_slow_logs.name
    destination_type = "cloudwatch-logs"
    log_format       = "json"
    log_type         = "slow-log"
  }

  log_delivery_configuration {
    destination      = aws_cloudwatch_log_group.elasticache_engine_logs.name
    destination_type = "cloudwatch-logs"
    log_format       = "json"
    log_type         = "engine-log"
  }

  tags = {
    Environment = "production"
    Team        = "platform"
    CostCenter  = "infrastructure"
  }
}

resource "aws_elasticache_subnet_group" "private" {
  name       = "myapp-elasticache-private"
  subnet_ids = var.private_subnet_ids
}

resource "aws_security_group" "elasticache" {
  name_prefix = "elasticache-"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 6379
    to_port         = 6379
    protocol        = "tcp"
    security_groups = [var.app_security_group_id]  # Only allow app tier
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "elasticache-sg"
  }
}

# CloudWatch alarms for cache health
resource "aws_cloudwatch_metric_alarm" "cache_evictions" {
  alarm_name          = "elasticache-high-evictions"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "5"
  metric_name         = "Evictions"
  namespace           = "AWS/ElastiCache"
  period              = "60"
  statistic           = "Sum"
  threshold           = "100"  # Alert if >100 evictions/minute sustained
  alarm_description   = "Cache is evicting keys — memory pressure or TTL storm"

  dimensions = {
    ReplicationGroupId = aws_elasticache_replication_group.valkey_cache.id
  }

  alarm_actions = [var.sns_alert_topic_arn]
}

resource "aws_cloudwatch_metric_alarm" "cache_memory_high" {
  alarm_name          = "elasticache-memory-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "3"
  metric_name         = "DatabaseMemoryUsagePercentage"
  namespace           = "AWS/ElastiCache"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"  # Alert at 80% memory to allow time to resize
  alarm_description   = "ElastiCache memory usage above 80% — consider scaling up"

  dimensions = {
    ReplicationGroupId = aws_elasticache_replication_group.valkey_cache.id
  }

  alarm_actions = [var.sns_alert_topic_arn]
}

output "cache_primary_endpoint" {
  value = aws_elasticache_replication_group.valkey_cache.primary_endpoint_address
}

output "cache_reader_endpoint" {
  value = aws_elasticache_replication_group.valkey_cache.reader_endpoint_address
}

Putting It Together: Total Cost Impact

For a SaaS application at 100,000 DAU with ElastiCache t4g.medium ($23/month):

Use CaseAlternativeAlternative CostRedis CostMonthly Savings
Session storageDynamoDB~$135/monthShared ElastiCache~$135
Rate limiting (10k users)API Gateway usage plans~$900/monthShared ElastiCache~$900
Simple queueSQS (1M msgs/day)~$12/monthShared ElastiCache~$12
Cache (DB read offset 60%)Additional RDS reads~$45/monthShared ElastiCache~$45
Distributed locksDynamoDB conditional writes~$8/monthShared ElastiCache~$8
ElastiCache cost$23/month
Net savings~$1,077/month

A single ElastiCache t4g.medium serving all these workloads simultaneously delivers over $1,000/month in savings over managed-service alternatives at 100,000 DAU scale.

For a deeper look at caching patterns specifically for production environments including TTL strategies and invalidation, see our ElastiCache Redis caching strategies guide. For workloads where SQS is the right choice over Redis Streams, our SQS reliable messaging patterns guide covers dead letter queues, visibility timeouts, and FIFO ordering in depth. The full cross-service cost optimization framework is in the AWS cost control architecture playbook.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »
How to Migrate to AWS Without Cost Surprises

How to Migrate to AWS Without Cost Surprises

AWS migration cost estimates are consistently wrong — not because the tools are bad, but because they miss the parallel run period, data transfer during migration, and the operational tax of learning a new environment. Here is what to actually model.