---
title: How to Use Redis and Valkey as a Cost-Saving Layer (Not Just Cache)
description: Redis and its fork Valkey reduce AWS costs beyond caching: rate limiting, session storage, and distributed coordination all have cheaper implementations via in-memory data structures than the AWS-managed alternatives. Here is how to use them.
url: https://www.factualminds.com/blog/redis-valkey-cost-saving-layer-aws/
datePublished: 2026-03-29T00:00:00.000Z
dateModified: 2026-04-16T00:00:00.000Z
author: palaniappan-p
category: Cloud Architecture
tags: how-to-guide, redis, valkey, elasticache, aws, caching, cost-optimization, rate-limiting, sessions, distributed-locks
---

# How to Use Redis and Valkey as a Cost-Saving Layer (Not Just Cache)

> Redis and its fork Valkey reduce AWS costs beyond caching: rate limiting, session storage, and distributed coordination all have cheaper implementations via in-memory data structures than the AWS-managed alternatives. Here is how to use them.

Most teams deploy Redis as a cache and nothing else. They add it to reduce database reads, see a performance improvement, and leave it there. What they miss is that Redis — or Valkey, its Apache-licensed successor — can replace half a dozen other AWS services at a fraction of the cost, often with better performance.

This guide covers the cost math for replacing DynamoDB sessions, SQS for simple queues, API Gateway throttling for rate limiting, and coordination via distributed locks. Then it covers the operational details that prevent those cost savings from disappearing in incidents.

---

## Redis vs Valkey in 2026: What Changed and What Matters

### The License Fork

In March 2024, Redis Ltd. relicensed Redis 7.4+ under two non-open-source licenses: RSALv2 (Redis Source Available License v2) and SSPLv1 (Server Side Public License v1). Neither license is approved by the Open Source Initiative. The practical implication: cloud providers cannot offer Redis 7.4+ as a managed service under a standard arrangement, and organizations with open source compliance requirements cannot use it.

The Linux Foundation and former Redis contributors immediately forked Redis 7.2 as Valkey. The first stable release, Valkey 7.2.5, was available within weeks. Valkey 8.0 followed in late 2024 with performance improvements and new data structure enhancements. Valkey is licensed under Apache 2.0.

AWS launched ElastiCache for Valkey in November 2024. AWS also maintains ElastiCache for Redis (capped at 7.1, the last Apache-licensed version) and Amazon MemoryDB for Redis (also on 7.x). All three are available today.

### Migration Path: Redis → Valkey

Valkey 8.0 is wire-protocol compatible with Redis 7.2. The RESP3 protocol works identically. All standard Redis commands (GET, SET, HSET, ZADD, LPUSH, XADD, etc.) work unchanged. Lua scripting, Redis modules (with LGPL compatibility), and pub/sub all work.

Client libraries do not require changes:

- Node.js: `ioredis` and `node-redis` work with Valkey without modification
- Go: `go-redis/redis` works with Valkey without modification
- Python: `redis-py` works with Valkey without modification
- PHP: `predis` and `phpredis` work with Valkey without modification

For ElastiCache migration in Terraform, change the `engine` parameter:

```hcl
# Before (Redis)
resource "aws_elasticache_replication_group" "cache" {
  engine         = "redis"
  engine_version = "7.1"
  # ...
}

# After (Valkey — drop-in replacement)
resource "aws_elasticache_replication_group" "cache" {
  engine         = "valkey"
  engine_version = "8.0"
  # ...
}
```

In-place upgrade from ElastiCache Redis to Valkey is available via the AWS console or CLI (`modify-replication-group --engine valkey`). The upgrade involves a rolling restart with no downtime on multi-AZ clusters.

---

## Cache Patterns: Understanding the Cost of Each

### Cache-Aside (Lazy Loading)

The most common pattern. Application checks cache first, fetches from database on miss, writes to cache.

```
Cache HIT:  1 Redis GET → return data (sub-millisecond)
Cache MISS: 1 Redis GET + 1 DB read + 1 Redis SET → return data (~5-50ms)
```

**Cost**: On cache miss, you pay for 2 round trips (Redis + DB) vs 1 (DB direct). For a cache hit rate of 90%, average latency is: `0.9 × 0.5ms + 0.1 × (0.5ms + 20ms) = 0.45 + 2.05 = 2.5ms`. The cost-saving mechanism is that DB reads are more expensive than Redis reads — RDS read I/O or DynamoDB read units add up; Redis reads are included in a flat monthly ElastiCache fee.

**When to use**: Read-heavy workloads where cache hit rate exceeds ~70%, and where serving slightly stale data is acceptable. Node.js implementation:

```javascript
// cache-aside.js
const Redis = require('ioredis');
const redis = new Redis({
  host: process.env.ELASTICACHE_ENDPOINT,
  port: 6379,
  retryStrategy: (times) => Math.min(times * 50, 2000),
  enableOfflineQueue: false, // Fail fast if Redis is down — don't queue requests
});

async function getUserProfile(userId) {
  const cacheKey = `user:profile:${userId}`;
  const ttlSeconds = 300 + Math.floor(Math.random() * 60); // 300-360s TTL jitter

  // 1. Check cache
  const cached = await redis.get(cacheKey);
  if (cached !== null) {
    return JSON.parse(cached);
  }

  // 2. Cache miss: fetch from DB
  const user = await db.users.findById(userId);
  if (!user) {
    return null;
  }

  // 3. Write to cache with TTL
  await redis.set(cacheKey, JSON.stringify(user), 'EX', ttlSeconds);

  return user;
}
```

### Write-Through

Application writes to cache and database simultaneously. Cache is always consistent with the database.

**Cost**: Every write pays for both a DB write AND a Redis write. For write-heavy workloads, this doubles write I/O. Write-through is appropriate when reads are much more frequent than writes and cache consistency is critical.

**Write amplification cost example**: A user updates their profile (1 DB write). With write-through, you also write to Redis (1 Redis write). If DynamoDB charges $1.25/million writes and ElastiCache is flat-rate, this write amplification is nearly free for low-write workloads. But for write-heavy workloads (>1 million writes/day), the duplicate work adds CPU overhead on ElastiCache.

### Write-Behind (Write-Back)

Application writes to cache first, then asynchronously to the database. Lowest write latency, highest data loss risk.

Redis does not natively support write-behind — you implement it by writing to Redis, then using a background process to flush to the database. This pattern is only appropriate when:

- Acknowledgment latency matters (gaming leaderboards, real-time counters)
- Data loss of the last few seconds is acceptable
- You have a reliable background process with dead-letter handling for flush failures

For most AWS workloads, the DynamoDB write cost savings from write-behind do not justify the data loss risk. Stick with cache-aside or write-through.

---

## Rate Limiting: Three Implementations with Cost Comparison

Rate limiting with Redis is cheaper than API Gateway usage plans when you have more than ~100 unique rate-limit subjects (users, IPs, API keys) or when you need rate limiting outside the HTTP layer.

### Fixed Window Counter (Simplest, Lowest Latency)

One Redis command per request. Fast, but allows burst at window boundary (2x limit in 2 seconds spanning window boundary).

```javascript
// Node.js - fixed window rate limiter with ioredis
const Redis = require('ioredis');
const redis = new Redis({ host: process.env.ELASTICACHE_ENDPOINT });

async function fixedWindowRateLimit(identifier, limit, windowSeconds) {
  const key = `ratelimit:fixed:${identifier}:${Math.floor(Date.now() / (windowSeconds * 1000))}`;

  const current = await redis.incr(key);

  if (current === 1) {
    // First request in this window: set expiry
    await redis.expire(key, windowSeconds);
  }

  return {
    allowed: current <= limit,
    current,
    limit,
    resetAt: (Math.floor(Date.now() / (windowSeconds * 1000)) + 1) * windowSeconds * 1000,
  };
}

// Usage
app.use(async (req, res, next) => {
  const result = await fixedWindowRateLimit(`user:${req.user.id}`, 100, 60); // 100 req/minute
  res.set('X-RateLimit-Limit', result.limit);
  res.set('X-RateLimit-Remaining', Math.max(0, result.limit - result.current));
  res.set('X-RateLimit-Reset', result.resetAt);

  if (!result.allowed) {
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }
  next();
});
```

### Sliding Window with Sorted Set (Most Accurate)

Uses a sorted set where score = timestamp. Accurately counts requests in the last N seconds without boundary burst issues. Two Redis commands per request (ZREMRANGEBYSCORE + ZADD + ZCARD in a pipeline).

```javascript
// Node.js - sliding window rate limiter
async function slidingWindowRateLimit(identifier, limit, windowMs) {
  const now = Date.now();
  const windowStart = now - windowMs;
  const key = `ratelimit:sliding:${identifier}`;

  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, '-inf', windowStart); // Remove old entries
  pipeline.zadd(key, now, `${now}-${Math.random()}`); // Add current request
  pipeline.zcard(key); // Count requests in window
  pipeline.pexpire(key, windowMs); // Reset TTL

  const results = await pipeline.exec();
  const count = results[2][1]; // Result of ZCARD

  return {
    allowed: count <= limit,
    current: count,
    limit,
    retryAfter: count > limit ? Math.ceil(windowMs / 1000) : 0,
  };
}
```

### Token Bucket with Lua Script (Atomic, Smoothest Rate Control)

Token bucket allows short bursts while enforcing average rate. Implemented as an atomic Lua script — no race conditions between check and update.

```javascript
// Node.js - token bucket rate limiter (Lua for atomicity)
const tokenBucketScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])   -- tokens per second
local now = tonumber(ARGV[3])           -- current timestamp in ms
local requested = tonumber(ARGV[4])    -- tokens requested (usually 1)

-- Get current state or initialize
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now

-- Refill tokens based on elapsed time
local elapsed = (now - last_refill) / 1000  -- convert to seconds
local new_tokens = math.min(capacity, tokens + (elapsed * refill_rate))

-- Check if request can be fulfilled
if new_tokens >= requested then
  new_tokens = new_tokens - requested
  redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
  redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate) * 1000)
  return {1, math.floor(new_tokens)}  -- allowed, remaining tokens
else
  redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
  redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate) * 1000)
  return {0, math.floor(new_tokens)}  -- denied, remaining tokens
end
`;

async function tokenBucketRateLimit(identifier, capacity, refillRate) {
  const key = `ratelimit:bucket:${identifier}`;
  const result = await redis.eval(
    tokenBucketScript,
    1,
    key, // 1 key, the key value
    capacity, // bucket capacity
    refillRate, // tokens per second
    Date.now(), // current timestamp
    1 // tokens requested
  );
  return {
    allowed: result[0] === 1,
    tokensRemaining: result[1],
    capacity,
  };
}
```

### Go Fixed Window Rate Limiter

```go
// Go - fixed window rate limiter with go-redis
package ratelimit

import (
    "context"
    "fmt"
    "time"

    "github.com/redis/go-redis/v9"
)

type FixedWindowLimiter struct {
    client        *redis.Client
    limit         int64
    windowSeconds int64
}

func NewFixedWindowLimiter(client *redis.Client, limit int64, window time.Duration) *FixedWindowLimiter {
    return &FixedWindowLimiter{
        client:        client,
        limit:         limit,
        windowSeconds: int64(window.Seconds()),
    }
}

type LimitResult struct {
    Allowed   bool
    Current   int64
    Limit     int64
    ResetAt   time.Time
}

func (l *FixedWindowLimiter) Allow(ctx context.Context, identifier string) (LimitResult, error) {
    windowID := time.Now().Unix() / l.windowSeconds
    key := fmt.Sprintf("ratelimit:fixed:%s:%d", identifier, windowID)

    pipe := l.client.Pipeline()
    incrCmd := pipe.Incr(ctx, key)
    pipe.Expire(ctx, key, time.Duration(l.windowSeconds)*time.Second)

    if _, err := pipe.Exec(ctx); err != nil {
        // Fail open: if Redis is unavailable, allow the request
        // (prevents Redis outage from taking down your API)
        return LimitResult{Allowed: true, Limit: l.limit}, nil
    }

    current := incrCmd.Val()
    resetAt := time.Unix((windowID+1)*l.windowSeconds, 0)

    return LimitResult{
        Allowed: current <= l.limit,
        Current: current,
        Limit:   l.limit,
        ResetAt: resetAt,
    }, nil
}
```

### PHP Sliding Window with Predis

```php
<?php

use Predis\Client;

class SlidingWindowRateLimiter
{
    public function __construct(
        private Client $redis,
        private int $limit,
        private int $windowMs
    ) {}

    public function isAllowed(string $identifier): array
    {
        $now = (int)(microtime(true) * 1000);
        $windowStart = $now - $this->windowMs;
        $key = "ratelimit:sliding:{$identifier}";

        $pipe = $this->redis->pipeline();
        $pipe->zremrangebyscore($key, '-inf', $windowStart);
        $pipe->zadd($key, [$now . '-' . uniqid() => $now]);
        $pipe->zcard($key);
        $pipe->pexpire($key, $this->windowMs);

        $results = $pipe->execute();
        $count = $results[2];

        return [
            'allowed'     => $count <= $this->limit,
            'current'     => $count,
            'limit'       => $this->limit,
            'retry_after' => $count > $this->limit ? ceil($this->windowMs / 1000) : 0,
        ];
    }
}
```

### Python Sliding Window with redis-py

```python
import time
import uuid
import redis

class SlidingWindowRateLimiter:
    def __init__(self, client: redis.Redis, limit: int, window_seconds: int):
        self.client = client
        self.limit = limit
        self.window_ms = window_seconds * 1000

    def is_allowed(self, identifier: str) -> dict:
        now_ms = int(time.time() * 1000)
        window_start_ms = now_ms - self.window_ms
        key = f"ratelimit:sliding:{identifier}"

        pipe = self.client.pipeline()
        pipe.zremrangebyscore(key, '-inf', window_start_ms)
        pipe.zadd(key, {f"{now_ms}-{uuid.uuid4().hex}": now_ms})
        pipe.zcard(key)
        pipe.pexpire(key, self.window_ms)
        results = pipe.execute()

        count = results[2]
        return {
            "allowed": count <= self.limit,
            "current": count,
            "limit": self.limit,
            "retry_after": max(0, (self.window_ms // 1000)) if count > self.limit else 0,
        }
```

---

## Session Storage: DynamoDB vs ElastiCache Cost Analysis

The cost model is simple: DynamoDB charges per operation, ElastiCache charges per hour regardless of operations.

### Cost Calculation at Scale

Assumptions for a SaaS application:

- 50,000 daily active users (DAU)
- Average 40 requests per session
- Each request reads session once (GET), writes on auth events (SET): ~2 reads, 0.05 writes per request
- Session TTL: 24 hours, JSON blob ~2 KB

**DynamoDB On-Demand session costs:**

```
Daily reads:  50,000 DAU × 40 requests × 2 reads = 4,000,000 reads/day
Daily writes: 50,000 DAU × 40 requests × 0.05 writes = 100,000 writes/day

DynamoDB cost:
  Reads:  4,000,000 / 1,000,000 × $0.25 = $1.00/day
  Writes: 100,000 / 1,000,000 × $1.25 = $0.125/day
  Storage: 50,000 sessions × 2KB × 30 days = 3 GB × $0.25/GB = $0.75/month
  Total: ~$33.75/month
```

**ElastiCache t4g.small session costs:**

```
t4g.small: 2 vCPU, 1.37 GB RAM, $0.016/hr
Monthly: $0.016 × 730 = $11.68/month

Capacity check: 50,000 sessions × 2KB = 100 MB — easily fits in 1.37 GB
```

**Savings at 50,000 DAU**: $33.75 - $11.68 = **$22.07/month**

At 500,000 DAU: DynamoDB ≈ $337/month vs ElastiCache t4g.medium ($0.032/hr = $23.36/month) = **$314/month savings**.

### Session Implementation

A Node.js session with ElastiCache:

```javascript
// express-session with ioredis store
const session = require('express-session');
const RedisStore = require('connect-redis').default;
const { createClient } = require('redis');

const redisClient = createClient({
  socket: {
    host: process.env.ELASTICACHE_ENDPOINT,
    port: 6379,
    tls: true, // ElastiCache encryption in transit
    rejectUnauthorized: false, // ElastiCache uses self-signed cert
  },
});

await redisClient.connect();

app.use(
  session({
    store: new RedisStore({
      client: redisClient,
      prefix: 'session:',
      ttl: 86400, // 24 hours in seconds
    }),
    secret: process.env.SESSION_SECRET,
    resave: false,
    saveUninitialized: false,
    cookie: {
      secure: true, // HTTPS only
      httpOnly: true, // No JS access
      maxAge: 86400 * 1000, // 24 hours in ms
      sameSite: 'strict',
    },
  })
);
```

---

## Redis as a Queue: When to Use vs SQS

Redis queues are appropriate for workloads requiring sub-millisecond enqueue/dequeue latency where SQS's eventual consistency model and ~20ms minimum latency are too slow.

### List-Based Simple Queue

```javascript
// Producer: push to queue
await redis.lpush(
  'jobs:email-send',
  JSON.stringify({
    to: 'user@example.com',
    template: 'welcome',
    userId: '12345',
    enqueuedAt: Date.now(),
  })
);

// Consumer: blocking pop (waits up to 30s for a message)
async function processEmailQueue() {
  while (true) {
    const result = await redis.brpop('jobs:email-send', 30); // 30s timeout
    if (result) {
      const [_queue, message] = result;
      const job = JSON.parse(message);
      await sendEmail(job);
    }
  }
}
```

**Limitation**: BRPOP/LPUSH provides no message acknowledgment. If the consumer crashes after popping but before processing, the message is lost. For jobs where loss is unacceptable, use Redis Streams or SQS.

### Redis Streams for Durable Queuing

Redis Streams (XADD/XREADGROUP) provide consumer groups, message acknowledgment, and pending message tracking — much closer to SQS's semantics at Redis speed.

```javascript
// Producer: append to stream
await redis.xadd(
  'stream:orders',
  '*', // Auto-generate message ID
  'order_id',
  '9876',
  'user_id',
  '12345',
  'total',
  '99.99',
  'status',
  'pending'
);

// Consumer group setup (run once)
await redis.xgroup('CREATE', 'stream:orders', 'order-processors', '0', 'MKSTREAM');

// Consumer: read with acknowledgment
async function processOrderStream(consumerId) {
  while (true) {
    // Read up to 10 messages, block for 5 seconds if empty
    const messages = await redis.xreadgroup(
      'GROUP',
      'order-processors',
      consumerId,
      'COUNT',
      10,
      'BLOCK',
      5000,
      'STREAMS',
      'stream:orders',
      '>' // '>' means undelivered messages only
    );

    if (!messages) {
      continue; // Timeout, loop again
    }

    for (const [_stream, entries] of messages) {
      for (const [messageId, fields] of entries) {
        const message = {};
        for (let i = 0; i < fields.length; i += 2) {
          message[fields[i]] = fields[i + 1];
        }

        try {
          await processOrder(message);
          // Acknowledge: removes from pending entries
          await redis.xack('stream:orders', 'order-processors', messageId);
        } catch (error) {
          // Message stays in pending — will be redelivered on next XREADGROUP
          console.error(`Failed to process ${messageId}:`, error);
        }
      }
    }
  }
}

// Check pending messages (unacknowledged, possibly stuck)
const pending = await redis.xpending(
  'stream:orders',
  'order-processors',
  '-',
  '+', // min/max message IDs
  10 // count
);
```

### Redis Queue vs SQS: When Each Wins

| Factor             | Redis Streams                | SQS                               |
| ------------------ | ---------------------------- | --------------------------------- |
| Latency            | <1ms                         | ~20ms minimum                     |
| Durability         | Memory + AOF/RDB persistence | Multi-AZ, 4-day retention default |
| Cost (at scale)    | Flat ElastiCache rate        | $0.40/million messages            |
| Visibility timeout | Manual (TTL on claim)        | Built-in, configurable            |
| Dead letter queue  | Manual implementation        | Native DLQ support                |
| FIFO ordering      | Yes (stream ID order)        | SQS FIFO (higher cost)            |
| Ops burden         | Managed (ElastiCache)        | Fully managed (SQS)               |

**Use Redis Streams when**: Your application already has Redis, latency matters (real-time notifications, gaming, live chat), and message volume is moderate (<1 million/day per stream).

**Use SQS when**: You need guaranteed durability, long message retention (up to 14 days), native DLQ support, or you are processing asynchronous background jobs where 20ms latency is irrelevant.

---

## Distributed Locks: Preventing Duplicate Processing

Distributed locks prevent multiple instances of a service from processing the same resource concurrently. This avoids duplicate charges, double-sends, and data inconsistency.

### SETNX Lock (Simple, Single Instance)

```javascript
// Simple lock with SETNX (SET if Not eXists)
async function acquireLock(resourceId, ttlMs = 5000) {
  const lockKey = `lock:${resourceId}`;
  const lockToken = `${Date.now()}-${Math.random()}`; // Unique token to identify this lock

  const acquired = await redis.set(
    lockKey,
    lockToken,
    'PX',
    ttlMs, // Expiry in milliseconds
    'NX' // Only set if key does not exist
  );

  return acquired ? lockToken : null; // Return token if acquired, null if already locked
}

async function releaseLock(resourceId, lockToken) {
  // Lua script: only release if we own the lock
  // Prevents releasing a lock acquired by another process
  const releaseLockScript = `
    if redis.call('GET', KEYS[1]) == ARGV[1] then
      return redis.call('DEL', KEYS[1])
    else
      return 0
    end
  `;
  const lockKey = `lock:${resourceId}`;
  return redis.eval(releaseLockScript, 1, lockKey, lockToken);
}

// Usage: prevent duplicate invoice processing
async function processInvoice(invoiceId) {
  const lockToken = await acquireLock(`invoice:${invoiceId}`, 30000);

  if (!lockToken) {
    console.log(`Invoice ${invoiceId} is being processed by another instance`);
    return;
  }

  try {
    await chargeInvoice(invoiceId);
  } finally {
    await releaseLock(`invoice:${invoiceId}`, lockToken);
  }
}
```

### Redlock for Multi-Node Safety

For systems where a single Redis node failure must not cause two processes to hold the lock simultaneously, use Redlock — acquire lock on majority of N Redis nodes.

```javascript
// Redlock with 3 ElastiCache nodes (separate primary nodes)
const Redlock = require('redlock');
const Redis = require('ioredis');

const nodes = [
  new Redis({ host: process.env.ELASTICACHE_NODE_1 }),
  new Redis({ host: process.env.ELASTICACHE_NODE_2 }),
  new Redis({ host: process.env.ELASTICACHE_NODE_3 }),
];

const redlock = new Redlock(nodes, {
  driftFactor: 0.01, // Assume 1% clock drift
  retryCount: 3,
  retryDelay: 200, // ms between retries
  retryJitter: 100, // Random jitter on retry delay
  automaticExtensionThreshold: 500, // Extend if lock held > (TTL - 500ms)
});

async function processWithRedlock(resourceId) {
  const lock = await redlock.acquire([`lock:${resourceId}`], 10000); // 10s TTL

  try {
    await processResource(resourceId);
  } finally {
    await redlock.release(lock);
  }
}
```

**Cost note**: Redlock requires N independent Redis nodes (not replicas of the same primary). This means N ElastiCache clusters. For most applications, the simple SETNX approach with a single multi-AZ ElastiCache cluster is sufficient. Use Redlock only when split-brain lock safety is a strict requirement.

---

## Memory Optimization: Getting More from Each ElastiCache Dollar

ElastiCache is billed by instance size. The difference between a t4g.medium ($0.032/hr = $23/month) and a r7g.large ($0.166/hr = $121/month) is $98/month. Optimizing memory usage keeps you on smaller instances longer.

### Eviction Policy Selection

The eviction policy determines what happens when Redis reaches `maxmemory`:

```
allkeys-lru     — Evict least recently used keys regardless of TTL
                  Best for: cache-only Redis where all data is expendable
                  Risk: important keys with far-future TTL can be evicted

volatile-lru    — Evict LRU keys among those with TTL set
                  Best for: mixed cache + persistent data (sessions with TTL,
                  config without TTL — config is never evicted)
                  Risk: if all keys have TTL, behaves like allkeys-lru

allkeys-lfu     — Evict least frequently used (Redis 4.0+)
                  Best for: workloads with irregular access patterns

noeviction      — Return error when memory full
                  Best for: queues/streams where data loss is unacceptable
```

For a mixed Redis deployment (cache + sessions + rate limit counters):

```
volatile-lru is usually the right choice:
- Sessions have TTL → can be evicted if memory pressure requires
- Rate limit counters have short TTL → can be evicted
- Any permanent configuration keys have no TTL → never evicted
```

Set the eviction policy in ElastiCache parameter group (see Terraform below).

### HASH vs STRING for Object Storage

Storing an object as a Redis HASH rather than a JSON string can save 40–70% memory for small objects, because Redis uses a compact ziplist encoding for HASHes with fewer than 128 fields and values under 64 bytes.

```javascript
// STRING: stores full JSON blob
await redis.set(
  'user:123',
  JSON.stringify({
    id: 123,
    name: 'Alice',
    email: 'alice@example.com',
    plan: 'pro',
    created_at: '2026-01-01',
  })
);
// Memory: ~100 bytes (JSON overhead + Redis key overhead + string encoding)

// HASH: Redis uses compact ziplist encoding for small hashes
await redis.hset('user:123', {
  id: '123',
  name: 'Alice',
  email: 'alice@example.com',
  plan: 'pro',
  created_at: '2026-01-01',
});
// Memory: ~60 bytes (ziplist encoding, 40% savings)

// Read specific field without fetching full object
const plan = await redis.hget('user:123', 'plan');

// Read multiple fields
const [name, email] = await redis.hmget('user:123', 'name', 'email');

// Read all fields
const user = await redis.hgetall('user:123');
```

Check encoding to verify ziplist is being used:

```bash
# Redis CLI memory analysis
redis-cli -h $ELASTICACHE_ENDPOINT

# Check encoding of a specific key
OBJECT ENCODING user:123
# Should return: "ziplist" or "listpack" (Redis 7.0+) for small hashes
# Returns: "hashtable" if hash exceeds hash-max-listpack-entries (default 128)

# Detailed memory usage
DEBUG OBJECT user:123
# Returns: serializedlength, encoding, type

# Memory usage of a key (in bytes)
MEMORY USAGE user:123

# Overall memory statistics
INFO memory
# Key metrics:
# used_memory: total allocated memory
# used_memory_rss: RSS from OS perspective (includes fragmentation)
# mem_fragmentation_ratio: used_memory_rss / used_memory (should be 1.0-1.5)
# maxmemory: configured maximum
# maxmemory_human: human-readable maximum
```

### Preventing Eviction Storms

When Redis hits `maxmemory`, it evicts keys according to the policy. If eviction is slow (many keys to scan), request latency spikes. To prevent eviction storms:

1. Set `maxmemory` to 80% of instance RAM (leave 20% headroom for overhead and fragmentation).
2. Monitor `evicted_keys` rate in CloudWatch — a sudden spike indicates memory pressure.
3. Use `MEMORY USAGE` to identify oversized keys consuming disproportionate memory.

```bash
# Find the 10 largest keys (expensive on large datasets — run during maintenance)
redis-cli -h $ENDPOINT --bigkeys

# Better for production: scan with MEMORY USAGE sampling
redis-cli -h $ENDPOINT --scan --pattern '*' | head -1000 | while read key; do
  size=$(redis-cli -h $ENDPOINT MEMORY USAGE "$key" 2>/dev/null || echo 0)
  echo "$size $key"
done | sort -rn | head -20
```

---

## Cache Stampede Prevention in Detail

Cache stampede is the most dangerous failure mode in a Redis-backed system. It can cascade: cache expires → 500 simultaneous DB queries → DB CPU spikes to 100% → query timeout → all 500 requests return error → retry storm → DB crash.

### Mutex Lock Approach

```javascript
// Mutex-based cache: only one request regenerates, others wait
const LOCK_TTL = 5000; // 5 seconds max for cache rebuild
const STALE_TTL = 30; // Serve stale for 30 seconds while regenerating

async function getWithMutex(cacheKey, fetchFn, ttl) {
  // 1. Check cache
  const cached = await redis.get(cacheKey);
  if (cached !== null) {
    return JSON.parse(cached);
  }

  // 2. Cache miss: try to acquire rebuild lock
  const lockKey = `lock:rebuild:${cacheKey}`;
  const lockToken = `${Date.now()}-${Math.random()}`;
  const acquired = await redis.set(lockKey, lockToken, 'PX', LOCK_TTL, 'NX');

  if (acquired) {
    // 3. We hold the lock: rebuild cache
    try {
      const data = await fetchFn();
      await redis.set(cacheKey, JSON.stringify(data), 'EX', ttl);
      return data;
    } finally {
      // Release lock
      const releaseScript = `
        if redis.call('GET', KEYS[1]) == ARGV[1] then
          return redis.call('DEL', KEYS[1])
        end
        return 0
      `;
      await redis.eval(releaseScript, 1, lockKey, lockToken);
    }
  } else {
    // 4. Another process is rebuilding: wait briefly, then retry
    await new Promise((resolve) => setTimeout(resolve, 100));
    const retried = await redis.get(cacheKey);
    if (retried !== null) {
      return JSON.parse(retried);
    }
    // If still missing after wait, fall through to DB
    return fetchFn();
  }
}
```

### Probabilistic Early Recomputation (PER)

PER proactively refreshes a cache entry before it expires, with probability increasing as expiry approaches. No locking required.

```javascript
// Probabilistic Early Recomputation
// beta controls how aggressively to early-refresh (higher = more eager, default 1.0)
async function getWithPER(cacheKey, fetchFn, ttl, beta = 1.0) {
  const cacheData = await redis.get(cacheKey);

  if (cacheData !== null) {
    const { value, delta, expiry } = JSON.parse(cacheData);
    const now = Date.now() / 1000;

    // Probability formula: expire early if random < beta * delta * log(random)
    // delta = time to compute the value (estimate)
    const shouldRefresh = now - delta * beta * Math.log(Math.random()) >= expiry;

    if (!shouldRefresh) {
      return value;
    }
    // Fall through to refresh (early recomputation)
  }

  // Cache miss or PER triggered: recompute
  const startTime = Date.now();
  const value = await fetchFn();
  const delta = (Date.now() - startTime) / 1000; // Computation time in seconds
  const expiry = Date.now() / 1000 + ttl;

  await redis.set(
    cacheKey,
    JSON.stringify({ value, delta, expiry }),
    'EX',
    ttl + Math.floor(delta * beta * 2) // Extend TTL slightly for PER window
  );

  return value;
}
```

---

## ElastiCache Terraform Configuration

A production-grade ElastiCache Valkey cluster with cluster mode enabled for horizontal scaling:

```hcl
# elasticache.tf

resource "aws_elasticache_parameter_group" "valkey8_production" {
  family = "valkey8"
  name   = "valkey8-production"

  # Eviction policy: evict LRU keys with TTL set (volatile-lru)
  # Protects permanent keys (config, feature flags) from eviction
  parameter {
    name  = "maxmemory-policy"
    value = "volatile-lru"
  }

  # Lazy freeing: delete expired keys asynchronously (lower latency)
  parameter {
    name  = "lazyfree-lazy-expire"
    value = "yes"
  }

  parameter {
    name  = "lazyfree-lazy-eviction"
    value = "yes"
  }

  # Enable keyspace notifications for expiry events
  # Useful for TTL-based workflows (e.g., session expiry cleanup hooks)
  # K = keyspace events, E = keyevent events, x = expired events
  parameter {
    name  = "notify-keyspace-events"
    value = "Ex"
  }

  # Slowlog: log commands slower than 100ms
  parameter {
    name  = "slowlog-log-slower-than"
    value = "100000"  # microseconds
  }

  parameter {
    name  = "slowlog-max-len"
    value = "128"
  }

  # Hash optimization: use ziplist (listpack in Valkey 8) for small hashes
  parameter {
    name  = "hash-max-listpack-entries"
    value = "128"
  }

  parameter {
    name  = "hash-max-listpack-value"
    value = "64"
  }
}

resource "aws_elasticache_replication_group" "valkey_cache" {
  replication_group_id = "myapp-valkey-cache"
  description          = "Valkey cluster for caching, sessions, and rate limiting"

  engine         = "valkey"
  engine_version = "8.0"
  node_type      = "cache.t4g.medium"  # 2 vCPU, 3.09 GB RAM, $0.032/hr

  # Multi-AZ with automatic failover
  multi_az_enabled           = true
  automatic_failover_enabled = true

  # Number of shards (num_cache_clusters = primary + replicas per shard)
  num_cache_clusters = 2  # 1 primary + 1 replica

  parameter_group_name = aws_elasticache_parameter_group.valkey8_production.name

  # Encryption
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  kms_key_id                 = aws_kms_key.elasticache.arn

  # Maintenance and backup
  maintenance_window       = "sun:03:00-sun:04:00"
  snapshot_window          = "02:00-03:00"
  snapshot_retention_limit = 7

  # Auth token (password) for access control
  auth_token = var.elasticache_auth_token

  subnet_group_name  = aws_elasticache_subnet_group.private.name
  security_group_ids = [aws_security_group.elasticache.id]

  # Apply changes immediately in non-production; use false in production
  apply_immediately = false

  log_delivery_configuration {
    destination      = aws_cloudwatch_log_group.elasticache_slow_logs.name
    destination_type = "cloudwatch-logs"
    log_format       = "json"
    log_type         = "slow-log"
  }

  log_delivery_configuration {
    destination      = aws_cloudwatch_log_group.elasticache_engine_logs.name
    destination_type = "cloudwatch-logs"
    log_format       = "json"
    log_type         = "engine-log"
  }

  tags = {
    Environment = "production"
    Team        = "platform"
    CostCenter  = "infrastructure"
  }
}

resource "aws_elasticache_subnet_group" "private" {
  name       = "myapp-elasticache-private"
  subnet_ids = var.private_subnet_ids
}

resource "aws_security_group" "elasticache" {
  name_prefix = "elasticache-"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 6379
    to_port         = 6379
    protocol        = "tcp"
    security_groups = [var.app_security_group_id]  # Only allow app tier
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "elasticache-sg"
  }
}

# CloudWatch alarms for cache health
resource "aws_cloudwatch_metric_alarm" "cache_evictions" {
  alarm_name          = "elasticache-high-evictions"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "5"
  metric_name         = "Evictions"
  namespace           = "AWS/ElastiCache"
  period              = "60"
  statistic           = "Sum"
  threshold           = "100"  # Alert if >100 evictions/minute sustained
  alarm_description   = "Cache is evicting keys — memory pressure or TTL storm"

  dimensions = {
    ReplicationGroupId = aws_elasticache_replication_group.valkey_cache.id
  }

  alarm_actions = [var.sns_alert_topic_arn]
}

resource "aws_cloudwatch_metric_alarm" "cache_memory_high" {
  alarm_name          = "elasticache-memory-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "3"
  metric_name         = "DatabaseMemoryUsagePercentage"
  namespace           = "AWS/ElastiCache"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"  # Alert at 80% memory to allow time to resize
  alarm_description   = "ElastiCache memory usage above 80% — consider scaling up"

  dimensions = {
    ReplicationGroupId = aws_elasticache_replication_group.valkey_cache.id
  }

  alarm_actions = [var.sns_alert_topic_arn]
}

output "cache_primary_endpoint" {
  value = aws_elasticache_replication_group.valkey_cache.primary_endpoint_address
}

output "cache_reader_endpoint" {
  value = aws_elasticache_replication_group.valkey_cache.reader_endpoint_address
}
```

---

## Putting It Together: Total Cost Impact

For a SaaS application at 100,000 DAU with ElastiCache t4g.medium ($23/month):

| Use Case                   | Alternative                 | Alternative Cost | Redis Cost         | Monthly Savings   |
| -------------------------- | --------------------------- | ---------------- | ------------------ | ----------------- |
| Session storage            | DynamoDB                    | ~$135/month      | Shared ElastiCache | ~$135             |
| Rate limiting (10k users)  | API Gateway usage plans     | ~$900/month      | Shared ElastiCache | ~$900             |
| Simple queue               | SQS (1M msgs/day)           | ~$12/month       | Shared ElastiCache | ~$12              |
| Cache (DB read offset 60%) | Additional RDS reads        | ~$45/month       | Shared ElastiCache | ~$45              |
| Distributed locks          | DynamoDB conditional writes | ~$8/month        | Shared ElastiCache | ~$8               |
| **ElastiCache cost**       |                             |                  | **$23/month**      |                   |
| **Net savings**            |                             |                  |                    | **~$1,077/month** |

A single ElastiCache t4g.medium serving all these workloads simultaneously delivers over $1,000/month in savings over managed-service alternatives at 100,000 DAU scale.

For a deeper look at caching patterns specifically for production environments including TTL strategies and invalidation, see our [ElastiCache Redis caching strategies guide](/blog/aws-elasticache-redis-caching-strategies-for-production/). For workloads where SQS is the right choice over Redis Streams, our [SQS reliable messaging patterns guide](/blog/aws-sqs-reliable-messaging-patterns-for-production/) covers dead letter queues, visibility timeouts, and FIFO ordering in depth. The full cross-service cost optimization framework is in the [AWS cost control architecture playbook](/blog/aws-cost-control-architecture-optimization-playbook/).

## FAQ

### What is Valkey and should you migrate from Redis on AWS ElastiCache?
Valkey is a Linux Foundation fork of Redis created in March 2024 after Redis Ltd. changed Redis 7.4+ to the RSALv2 (Redis Source Available License v2) and SSPLv1 licenses, which are not OSI-approved open source. Valkey 8.0 is licensed under Apache 2.0 and is drop-in compatible with Redis 7.2 — same RESP3 protocol, same data structures, same commands. AWS launched ElastiCache for Valkey in late 2024 at pricing equivalent to Redis tiers. For existing ElastiCache Redis deployments, migration is straightforward: change the engine parameter in your Terraform resource from "redis" to "valkey" and redeploy. Client libraries work without changes. Whether to migrate depends primarily on licensing philosophy and vendor preference — there is no performance gap at Valkey 8.0, and AWS supports both engines with equivalent SLAs.

### How does Redis session storage reduce AWS costs at scale?
Redis session storage reduces costs by replacing per-read/write DynamoDB charges with a flat monthly compute cost. DynamoDB charges $1.25 per million write request units and $0.25 per million read request units. A web application with 100,000 daily active users, each making 50 requests/session with 2 session reads per request, generates 10 million session reads and 200,000 session writes per day — roughly $2.55/day or $76.50/month in DynamoDB costs for sessions alone. An ElastiCache t4g.small ($0.016/hr = $11.52/month) handles this volume with latency under 1ms. At 500,000 DAU, DynamoDB session costs reach ~$382/month vs ElastiCache t4g.medium ($0.032/hr = $23.04/month). The break-even is around 30,000 DAU, above which ElastiCache sessions are cheaper.

### What is cache stampede and how do you prevent it in production?
Cache stampede (also called thundering herd) occurs when a cached value expires and multiple concurrent requests all find a cache miss simultaneously. Each request independently queries the underlying database or service to recompute the value, creating a sudden spike in database load — potentially 100x normal read traffic if the cached value was popular. On AWS, this can trigger RDS CPU alarms, Aurora reader overload, or DynamoDB throttling. Prevention strategies: (1) Probabilistic early recomputation — before a cache entry expires, compute a probability that increases as expiry approaches and proactively refresh; (2) Mutex lock — the first request to detect a miss acquires a distributed lock and refreshes the cache, while other requests wait or serve the stale value; (3) Staggered TTLs — add random jitter (e.g., base TTL + random 0-300 seconds) to prevent synchronized expiry of related keys across multiple instances.

### When does Redis rate limiting save more money than API Gateway throttling?
Redis rate limiting saves money over API Gateway throttling when you need fine-grained per-user, per-resource, or per-IP limits that would require API Gateway usage plans. API Gateway usage plans charge $3.00 per million API calls plus $0.09/month per API key in usage plans. For a SaaS application with 10,000 customers each on different rate limit tiers, managing 10,000 API keys in API Gateway costs $900/month in API key fees alone before request charges. A Redis-based rate limiter on ElastiCache t4g.medium ($23/month) can enforce per-user sliding window limits across all endpoints with sub-millisecond overhead per request check. Redis rate limiting also works for internal service-to-service rate limiting where API Gateway is not in the request path — a use case API Gateway cannot address at all.

---

*Source: https://www.factualminds.com/blog/redis-valkey-cost-saving-layer-aws/*
