How to Prevent Queue-Based Cost Explosions on AWS
Quick summary: SQS charges per API request. Retry storms, misconfigured visibility timeouts, and unlimited worker concurrency turn queue costs from predictable to catastrophic. Here is how to prevent it.
Key Takeaways
- SQS charges per API request
- SQS charges per API request

Table of Contents
SQS pricing looks deceptively simple: $0.40 per million requests for standard queues. At that rate, even a moderately busy queue should cost almost nothing. And it does — right up until something breaks, your retry logic kicks in, and a single incident turns a $2/day queue into a $300/day billing event.
The failure modes that cause queue cost explosions are specific and preventable. This guide covers each one with the math, the code, and the CloudWatch alarms that catch them before your finance team sends you a Slack message.
Understanding the SQS Cost Model
Before you can prevent cost explosions, you need to understand exactly what SQS charges for.
Every API call is a request. SQS charges per API request, not per message. One ReceiveMessage call that returns 10 messages is one request. One ReceiveMessage call that returns 0 messages (empty poll) is also one request at the same price.
The calls that generate costs:
| API Call | When It Happens | Cost Factor |
|---|---|---|
SendMessage | Producer enqueues a message | 1× per message sent |
ReceiveMessage | Worker polls for messages | 1× per poll, regardless of messages returned |
DeleteMessage | Worker confirms processing | 1× per successfully processed message |
ChangeMessageVisibility | Worker extends timeout | 1× per call |
SendMessageBatch | Batch enqueue (up to 10) | 1× per batch call |
ReceiveMessage (empty) | Poll returns nothing | 1× charged the same as a full poll |
The math that matters:
A single worker polling every 1 second with short polling:
- 60 ReceiveMessage calls/minute
- 1 DeleteMessage per processed message
- For 100 messages/minute: 60 polls + 100 deletes = 160 requests/minute = 230,400 requests/day
- Cost: $0.09/day — trivially cheap
The same worker, with a downstream dependency failing:
- Every message fails, visibility timeout expires, message re-enters queue
- Worker processes same message 5× (maxReceiveCount=5) before DLQ
- Plus all the empty polls while waiting for visibility timeout
- 1,000 such workers: 1,000 × 60 polls/minute = 60,000 requests/minute = 86,400,000 requests/day
- Cost: $34.56/day just for polling — plus all the failed process attempts
Scale to a production incident with 5,000 workers in a retry storm and you are looking at $170/day in SQS API costs for a queue doing zero useful work.
Long Polling: The Free Win
Short polling (the default) queries SQS message servers and returns immediately, even if no messages are available. Long polling holds the connection open for up to 20 seconds, waiting for messages to arrive.
Switch every queue to long polling. This is free in implementation cost and reduces empty polls by 90%+ for queues with moderate activity.
Configure at the queue level:
ReceiveMessageWaitTimeSeconds = 20Or per ReceiveMessage call:
WaitTimeSeconds = 20At 20-second long polling, a worker makes at most 3 ReceiveMessage calls per minute instead of 60. For 1,000 workers:
- Short polling: 60,000 requests/minute
- Long polling: 3,000 requests/minute
That is a 20× reduction in ReceiveMessage API costs before you change anything else.
Combine with batching: MaxNumberOfMessages = 10 retrieves up to 10 messages per poll. One API call processes 10 messages instead of one. Net effect: long polling + batching can reduce API request costs by 95%+ compared to naive short-polling with individual message retrieval.
Exponential Backoff with Full Jitter
Retry logic is necessary. Synchronous retry logic is the cause of thundering herd problems and retry storms.
Pure constant-interval retry: every failing consumer retries every 10 seconds. With 1,000 consumers all hitting the same broken dependency, you get 1,000 concurrent requests every 10 seconds — a sustained attack on an already-struggling system.
Pure exponential backoff: delays double with each retry (1s, 2s, 4s, 8s…). Reduces average retry rate but all 1,000 consumers still retry at the same times if they started failing simultaneously. This creates synchronized bursts rather than a constant stream, but each burst is still 1,000 concurrent requests.
Full jitter: randomize the delay between 0 and the exponential ceiling. Consumers are desynchronized — retries are spread across the entire backoff window rather than synchronized at the window boundaries.
The math: with full jitter, the expected wait time is half the exponential ceiling (since you pick uniformly between 0 and ceiling). The variance ensures consumers spread out across the window.
Go: Exponential Backoff with Full Jitter
package backoff
import (
"math"
"math/rand"
"time"
)
const (
baseDelay = 100 * time.Millisecond
maxDelay = 30 * time.Second
maxRetries = 10
)
// FullJitter returns a random duration between 0 and the exponential ceiling.
// This desynchronizes retries across concurrent callers.
func FullJitter(attempt int) time.Duration {
if attempt <= 0 {
return 0
}
// Exponential ceiling: base * 2^attempt, capped at maxDelay
ceiling := float64(baseDelay) * math.Pow(2, float64(attempt))
if ceiling > float64(maxDelay) {
ceiling = float64(maxDelay)
}
// Uniform random between 0 and ceiling
jittered := time.Duration(rand.Float64() * ceiling)
return jittered
}
// RetryWithBackoff executes fn, retrying on error with full jitter backoff.
// Returns the last error after maxRetries attempts.
func RetryWithBackoff(fn func() error) error {
var lastErr error
for attempt := 0; attempt < maxRetries; attempt++ {
lastErr = fn()
if lastErr == nil {
return nil
}
if attempt == maxRetries-1 {
break
}
delay := FullJitter(attempt + 1)
time.Sleep(delay)
}
return lastErr
}
// RetryableError wraps an error to indicate it is safe to retry.
type RetryableError struct {
Cause error
}
func (e *RetryableError) Error() string { return e.Cause.Error() }
func (e *RetryableError) Unwrap() error { return e.Cause }
// RetryWithBackoffSelective retries only RetryableError types.
// Non-retryable errors return immediately.
func RetryWithBackoffSelective(fn func() error) error {
var lastErr error
for attempt := 0; attempt < maxRetries; attempt++ {
lastErr = fn()
if lastErr == nil {
return nil
}
var retryable *RetryableError
if !errors.As(lastErr, &retryable) {
// Non-retryable error — fail immediately, no delay
return lastErr
}
if attempt == maxRetries-1 {
break
}
delay := FullJitter(attempt + 1)
time.Sleep(delay)
}
return lastErr
}The key difference: FullJitter(attempt) returns a value between 0 and baseDelay * 2^attempt, not exactly baseDelay * 2^attempt. At attempt 5 with a 100ms base delay, the ceiling is 3.2 seconds, and the actual wait is somewhere between 0 and 3.2 seconds — uniformly distributed.
At 1,000 consumers all hitting retry attempt 5 simultaneously, they spread their retries across a 3.2-second window instead of all retrying at the exact 3.2-second mark. The downstream system sees at most ~312 requests/second instead of 1,000 requests in a burst.
Visibility Timeout Tuning
The visibility timeout is how long SQS hides a message from other consumers after it has been received. If your worker does not delete the message before the timeout expires, SQS assumes the worker failed and makes the message available again.
Too short: message re-enters queue before processing completes. Your worker processes it, tries to delete it, gets a MessageNotVisible error (the message was already re-queued and possibly picked up by another worker). You now have duplicate processing and double the delete API calls.
Too long: a worker crashes mid-processing. The message is invisible for the full timeout before it is retried. A 12-hour visibility timeout with a crashing consumer means 12-hour delayed retries.
Formula: visibility_timeout = max_expected_processing_time × 1.5
For typical web API calls or database operations (processing in under 30 seconds):
- Set visibility timeout to 45–60 seconds.
For jobs that call external APIs with long timeouts (60 second external call):
- Set visibility timeout to 90–120 seconds.
For long-running jobs (report generation, video processing):
- Set a moderate initial visibility timeout (5 minutes).
- Extend it mid-processing using
ChangeMessageVisibilitybefore it expires.
Extending Visibility Mid-Processing
import boto3
import threading
import time
sqs = boto3.client('sqs', region_name='us-east-1')
def process_with_visibility_extension(queue_url: str, message: dict, processor_fn) -> None:
receipt_handle = message['ReceiptHandle']
extend_interval = 30 # Extend every 30 seconds
extension_amount = 60 # Extend by 60 seconds each time
stop_event = threading.Event()
def extend_visibility() -> None:
while not stop_event.wait(extend_interval):
try:
sqs.change_message_visibility(
QueueUrl=queue_url,
ReceiptHandle=receipt_handle,
VisibilityTimeout=extension_amount,
)
except Exception:
# Message may have already been deleted; ignore
pass
extender = threading.Thread(target=extend_visibility, daemon=True)
extender.start()
try:
processor_fn(message)
sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=receipt_handle)
finally:
stop_event.set()This pattern is particularly important for jobs where processing time varies widely — report generation that takes 10 seconds on a small dataset and 90 seconds on a large one.
Worker Concurrency Caps
Uncapped worker concurrency is the most common cause of SQS cost explosions during incidents. When a queue has a large backlog, autoscaling launches many workers. Each worker polls independently. Without a cap, you have no upper bound on API call volume.
Node.js (BullMQ) with Concurrency Cap and Backoff
BullMQ is a Redis-backed queue library for Node.js with excellent concurrency controls. When used with SQS via a bridge pattern or directly, concurrency settings prevent runaway worker scaling:
import { Worker, Job } from 'bullmq';
import IORedis from 'ioredis';
const connection = new IORedis({
host: process.env.REDIS_HOST,
port: 6379,
maxRetriesPerRequest: null,
});
const worker = new Worker(
'order-processing',
async (job: Job) => {
const { orderId, customerId } = job.data;
// Process the order
await processOrder(orderId, customerId);
},
{
connection,
concurrency: 10, // Maximum 10 concurrent jobs per worker process
limiter: {
max: 50, // Maximum 50 jobs processed per duration window
duration: 1000, // Duration in milliseconds (1 second)
},
},
);
// Graceful shutdown on SIGTERM
const shutdown = async () => {
await worker.close();
connection.disconnect();
process.exit(0);
};
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);
worker.on('failed', (job, err) => {
console.error(`Job ${job?.id} failed:`, err.message);
});
worker.on('error', (err) => {
console.error('Worker error:', err);
});The concurrency: 10 limit means this worker process handles at most 10 jobs simultaneously regardless of how many jobs are queued. The limiter adds a rate limit on top: at most 50 jobs per second across all workers sharing this Redis queue.
If you deploy 5 worker containers, you have 50 total concurrent jobs and 250 jobs/second throughput — predictable, budgetable, and controllable.
Laravel Queue Worker with Concurrency Controls
Laravel’s built-in queue:work command has several flags that directly control cost:
# Controlled worker command — use this in production
php artisan queue:work sqs \
--queue=orders,notifications \
--sleep=3 \ # Sleep 3s when queue is empty (reduces empty polls)
--tries=5 \ # Max attempts before DLQ
--max-time=3600 \ # Restart worker after 1 hour (memory leak prevention)
--max-jobs=500 \ # Process 500 jobs then restart (memory hygiene)
--timeout=60 \ # Kill job after 60s (matches visibility timeout)
--backoff=30,60,120 # Retry delays: 30s, 60s, 120s (exponential steps)The critical flags for cost control:
--sleep=3: When the queue is empty, wait 3 seconds before polling again instead of immediate re-poll. With 50 workers × 20 polls/minute (sleep=3) vs 60 polls/minute (no sleep): 1,000 fewer API calls/minute.--max-jobs=500: Restart after 500 jobs to prevent memory leaks from accumulating across long-running workers. Combine with a process manager (Supervisor, ECS) that restarts workers automatically.--timeout=60: This must be less than the SQS visibility timeout. If the job runs for longer than--timeout, Laravel kills it, the visibility timeout expires, and SQS redelivers the message. Set--timeoutto visibility_timeout - 10 seconds as a safety margin.
For per-connection queue configuration in config/queue.php:
'sqs' => [
'driver' => 'sqs',
'key' => env('AWS_ACCESS_KEY_ID'),
'secret' => env('AWS_SECRET_ACCESS_KEY'),
'prefix' => env('SQS_PREFIX', 'https://sqs.us-east-1.amazonaws.com/your-account-id'),
'queue' => env('SQS_QUEUE', 'default'),
'suffix' => env('SQS_SUFFIX'),
'region' => env('AWS_DEFAULT_REGION', 'us-east-1'),
// Long polling: wait up to 20 seconds for messages (reduces empty polls)
'after_commit' => false,
],For the SQS queue itself, set ReceiveMessageWaitTimeSeconds = 20 to enable long polling at the queue level. This applies to all consumers regardless of SDK configuration.
Celery Worker Concurrency
# Controlled Celery worker
celery -A myapp worker \
--concurrency=8 \ # 8 worker processes (set to CPU count for CPU-bound)
--max-tasks-per-child=100 \ # Restart worker after 100 tasks (memory hygiene)
--prefetch-multiplier=1 \ # Fetch 1 task per worker (prevents hoarding)
-Q orders,notifications--prefetch-multiplier=1 is critical. Default is 4: each worker prefetches 4 tasks from the queue. With 20 workers and prefetch=4, you have 80 messages hidden from other consumers (NotVisible) while only 20 are being actively processed. During a backlog, this looks like consumption but is actually just hiding messages. Set prefetch to 1 for fair distribution and accurate queue depth metrics.
Observability: Detecting Cost Spikes Before They Appear on Your Bill
The AWS Cost Explorer shows costs with 24+ hour delay. By the time a retry storm appears in your bill, you have paid for it. Build alerting on leading indicators.
The Retry Storm Signature
Monitor ApproximateNumberOfMessagesNotVisible (in-flight) vs ApproximateNumberOfMessagesVisible (waiting). Publish this ratio as a custom CloudWatch metric:
import boto3
import time
cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')
sqs = boto3.client('sqs', region_name='us-east-1')
def publish_queue_health_metrics(queue_url: str, queue_name: str) -> None:
attrs = sqs.get_queue_attributes(
QueueUrl=queue_url,
AttributeNames=[
'ApproximateNumberOfMessages',
'ApproximateNumberOfMessagesNotVisible',
],
)['Attributes']
visible = int(attrs.get('ApproximateNumberOfMessages', 0))
in_flight = int(attrs.get('ApproximateNumberOfMessagesNotVisible', 0))
# Ratio > 2 suggests workers are consuming but not completing
ratio = in_flight / max(visible, 1)
cloudwatch.put_metric_data(
Namespace='CustomQueues',
MetricData=[
{
'MetricName': 'InFlightToVisibleRatio',
'Dimensions': [{'Name': 'QueueName', 'Value': queue_name}],
'Value': ratio,
'Unit': 'None',
},
{
'MetricName': 'MessagesInFlight',
'Dimensions': [{'Name': 'QueueName', 'Value': queue_name}],
'Value': in_flight,
'Unit': 'Count',
},
],
)Run this on a 1-minute schedule via Lambda or a sidecar container. Create a CloudWatch alarm when InFlightToVisibleRatio exceeds 2.0 for 3 consecutive minutes — this is the retry storm signature.
Terraform: CloudWatch Alarm for DLQ Depth
variable "sns_alert_topic_arn" {
description = "SNS topic ARN for alerting"
type = string
}
resource "aws_sqs_queue" "orders_dlq" {
name = "orders-dlq"
message_retention_seconds = 1209600
}
resource "aws_sqs_queue" "orders" {
name = "orders"
visibility_timeout_seconds = 60
receive_wait_time_seconds = 20
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.orders_dlq.arn
maxReceiveCount = 5
})
}
# Alert when ANY message lands in DLQ
resource "aws_cloudwatch_metric_alarm" "orders_dlq_not_empty" {
alarm_name = "orders-dlq-not-empty"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "ApproximateNumberOfMessagesVisible"
namespace = "AWS/SQS"
period = 300
statistic = "Sum"
threshold = 0
alarm_description = "Messages in orders DLQ — processing failures require investigation"
alarm_actions = [var.sns_alert_topic_arn]
ok_actions = [var.sns_alert_topic_arn]
treat_missing_data = "notBreaching"
dimensions = {
QueueName = aws_sqs_queue.orders_dlq.name
}
}
# Alert on high queue depth (consumer lag indicator)
resource "aws_cloudwatch_metric_alarm" "orders_consumer_lag" {
alarm_name = "orders-consumer-lag-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 5
metric_name = "ApproximateNumberOfMessagesVisible"
namespace = "AWS/SQS"
period = 60
statistic = "Average"
threshold = 500
alarm_description = "Order queue backlog above 500 — check consumer health or scale workers"
alarm_actions = [var.sns_alert_topic_arn]
treat_missing_data = "notBreaching"
dimensions = {
QueueName = aws_sqs_queue.orders.name
}
}
# Alert on high in-flight count relative to visible
# (requires custom metric published by monitoring Lambda)
resource "aws_cloudwatch_metric_alarm" "orders_retry_storm_indicator" {
alarm_name = "orders-retry-storm-indicator"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 3
metric_name = "InFlightToVisibleRatio"
namespace = "CustomQueues"
period = 60
statistic = "Average"
threshold = 2
alarm_description = "Orders queue in-flight ratio > 2: possible retry storm"
alarm_actions = [var.sns_alert_topic_arn]
treat_missing_data = "notBreaching"
dimensions = {
QueueName = "orders"
}
}Circuit Breakers for Queue Workers
A circuit breaker prevents workers from hammering a broken dependency. Without one, a database outage causes every worker to attempt the database call, fail, release the message back to the queue, and immediately try again — burning SQS API calls while the database is trying to recover.
The circuit breaker pattern:
- Closed state: normal operation. Workers process messages as usual.
- Open state: too many failures detected. Workers skip processing immediately and release messages without attempting the failing operation. The circuit stays open for a cooldown period.
- Half-open state: after cooldown, allow a few test requests. If they succeed, close the circuit. If they fail, reopen.
For queue workers, the circuit breaker should release messages back to the queue (rather than deleting them) when the circuit is open, so they can be processed after recovery.
import time
from dataclasses import dataclass, field
from threading import Lock
from enum import Enum, auto
class CircuitState(Enum):
CLOSED = auto()
OPEN = auto()
HALF_OPEN = auto()
@dataclass
class CircuitBreaker:
failure_threshold: int = 5
recovery_timeout: float = 30.0
test_request_limit: int = 1
_state: CircuitState = field(default=CircuitState.CLOSED, init=False)
_failure_count: int = field(default=0, init=False)
_last_failure_time: float = field(default=0.0, init=False)
_test_requests: int = field(default=0, init=False)
_lock: Lock = field(default_factory=Lock, init=False)
def call(self, fn):
with self._lock:
if self._state == CircuitState.OPEN:
if time.monotonic() - self._last_failure_time > self.recovery_timeout:
self._state = CircuitState.HALF_OPEN
self._test_requests = 0
else:
raise CircuitOpenError("Circuit is OPEN — skipping call")
try:
result = fn()
self._on_success()
return result
except Exception as exc:
self._on_failure()
raise exc
def _on_success(self) -> None:
with self._lock:
self._failure_count = 0
self._state = CircuitState.CLOSED
def _on_failure(self) -> None:
with self._lock:
self._failure_count += 1
self._last_failure_time = time.monotonic()
if self._failure_count >= self.failure_threshold:
self._state = CircuitState.OPEN
class CircuitOpenError(Exception):
passIn your queue worker, catch CircuitOpenError and release the message back to the queue by not deleting it. This stops the retry storm at the application level without relying solely on SQS visibility timeouts.
The Backlog Explosion Scenario
You deploy a new version of your order processor with a bug. 100,000 messages back up in the queue over 4 hours before you roll back. The rollback fixes the bug. Now you have 100,000 messages and a fresh fleet of workers eager to process them.
Without rate limiting, all 100,000 messages are processed simultaneously. Your database gets hit with 10,000 concurrent queries. The database slows down. Workers time out. Messages return to the queue. Retry storm begins on top of the backlog.
Prevention:
- Rate-limit processing during recovery. A Redis-backed rate limiter shared across all workers:
max 500 orders/secondregardless of worker count. - Ramp up workers gradually rather than launching the full fleet immediately. Use step scaling with a 5-minute cooldown between scale steps.
- Use
DelaySecondson new messages during recovery to spread load across time. This only affects new messages, not existing backlog — but it prevents the live traffic from piling onto the recovery effort. - Monitor your downstream dependency (database, external API) during backlog replay. Have runbooks for reducing worker count if downstream shows stress.
Edge Cases That Will Catch You
The visibility timeout race condition: Your worker extends the visibility timeout successfully, processes the message, and calls DeleteMessage. But between the visibility extension and DeleteMessage, a different worker received the message (SQS delivered it twice — this is “at-least-once”). Now you have two workers that think they own the same message, and one of them will get a ReceiptHandleIsInvalid error when it tries to delete. Solution: implement idempotency (see the companion post on AWS SQS Reliable Messaging Patterns for Production).
The Lambda cold start avalanche: A large SQS backlog with Lambda consumers scales Lambda concurrency rapidly. If your Lambda function takes 2 seconds to cold-start and you have 500 concurrent invocations, those 500 cold starts hit your downstream dependency simultaneously. Use provisioned concurrency for time-sensitive queues, or set a lower reserved concurrency to cap the scale velocity.
The FIFO throughput cap: SQS FIFO standard mode caps at 300 transactions/sec per API action. If you hit this limit, messages back up silently — no error is thrown, messages just wait. High-throughput FIFO mode raises this to 9,000 transactions/sec but requires explicit opt-in and changes the pricing. Monitor NumberOfMessagesSent for FIFO queues — if it plateaus well below your expected production rate, you may be hitting the throughput cap.
Related reading: AWS SQS Reliable Messaging Patterns for Production and AWS Cost Control Architecture Optimization Playbook.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.




