AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

SQS charges per API request. Retry storms, misconfigured visibility timeouts, and unlimited worker concurrency turn queue costs from predictable to catastrophic. Here is how to prevent it.

Key Facts

  • SQS charges per API request
  • SQS charges per API request

Entity Definitions

SQS
SQS is an AWS service discussed in this article.

How to Prevent Queue-Based Cost Explosions on AWS

Quick summary: SQS charges per API request. Retry storms, misconfigured visibility timeouts, and unlimited worker concurrency turn queue costs from predictable to catastrophic. Here is how to prevent it.

Key Takeaways

  • SQS charges per API request
  • SQS charges per API request
How to Prevent Queue-Based Cost Explosions on AWS
Table of Contents

SQS pricing looks deceptively simple: $0.40 per million requests for standard queues. At that rate, even a moderately busy queue should cost almost nothing. And it does — right up until something breaks, your retry logic kicks in, and a single incident turns a $2/day queue into a $300/day billing event.

The failure modes that cause queue cost explosions are specific and preventable. This guide covers each one with the math, the code, and the CloudWatch alarms that catch them before your finance team sends you a Slack message.

Understanding the SQS Cost Model

Before you can prevent cost explosions, you need to understand exactly what SQS charges for.

Every API call is a request. SQS charges per API request, not per message. One ReceiveMessage call that returns 10 messages is one request. One ReceiveMessage call that returns 0 messages (empty poll) is also one request at the same price.

The calls that generate costs:

API CallWhen It HappensCost Factor
SendMessageProducer enqueues a message1× per message sent
ReceiveMessageWorker polls for messages1× per poll, regardless of messages returned
DeleteMessageWorker confirms processing1× per successfully processed message
ChangeMessageVisibilityWorker extends timeout1× per call
SendMessageBatchBatch enqueue (up to 10)1× per batch call
ReceiveMessage (empty)Poll returns nothing1× charged the same as a full poll

The math that matters:

A single worker polling every 1 second with short polling:

  • 60 ReceiveMessage calls/minute
  • 1 DeleteMessage per processed message
  • For 100 messages/minute: 60 polls + 100 deletes = 160 requests/minute = 230,400 requests/day
  • Cost: $0.09/day — trivially cheap

The same worker, with a downstream dependency failing:

  • Every message fails, visibility timeout expires, message re-enters queue
  • Worker processes same message 5× (maxReceiveCount=5) before DLQ
  • Plus all the empty polls while waiting for visibility timeout
  • 1,000 such workers: 1,000 × 60 polls/minute = 60,000 requests/minute = 86,400,000 requests/day
  • Cost: $34.56/day just for polling — plus all the failed process attempts

Scale to a production incident with 5,000 workers in a retry storm and you are looking at $170/day in SQS API costs for a queue doing zero useful work.

Long Polling: The Free Win

Short polling (the default) queries SQS message servers and returns immediately, even if no messages are available. Long polling holds the connection open for up to 20 seconds, waiting for messages to arrive.

Switch every queue to long polling. This is free in implementation cost and reduces empty polls by 90%+ for queues with moderate activity.

Configure at the queue level:

ReceiveMessageWaitTimeSeconds = 20

Or per ReceiveMessage call:

WaitTimeSeconds = 20

At 20-second long polling, a worker makes at most 3 ReceiveMessage calls per minute instead of 60. For 1,000 workers:

  • Short polling: 60,000 requests/minute
  • Long polling: 3,000 requests/minute

That is a 20× reduction in ReceiveMessage API costs before you change anything else.

Combine with batching: MaxNumberOfMessages = 10 retrieves up to 10 messages per poll. One API call processes 10 messages instead of one. Net effect: long polling + batching can reduce API request costs by 95%+ compared to naive short-polling with individual message retrieval.

Exponential Backoff with Full Jitter

Retry logic is necessary. Synchronous retry logic is the cause of thundering herd problems and retry storms.

Pure constant-interval retry: every failing consumer retries every 10 seconds. With 1,000 consumers all hitting the same broken dependency, you get 1,000 concurrent requests every 10 seconds — a sustained attack on an already-struggling system.

Pure exponential backoff: delays double with each retry (1s, 2s, 4s, 8s…). Reduces average retry rate but all 1,000 consumers still retry at the same times if they started failing simultaneously. This creates synchronized bursts rather than a constant stream, but each burst is still 1,000 concurrent requests.

Full jitter: randomize the delay between 0 and the exponential ceiling. Consumers are desynchronized — retries are spread across the entire backoff window rather than synchronized at the window boundaries.

The math: with full jitter, the expected wait time is half the exponential ceiling (since you pick uniformly between 0 and ceiling). The variance ensures consumers spread out across the window.

Go: Exponential Backoff with Full Jitter

package backoff

import (
	"math"
	"math/rand"
	"time"
)

const (
	baseDelay  = 100 * time.Millisecond
	maxDelay   = 30 * time.Second
	maxRetries = 10
)

// FullJitter returns a random duration between 0 and the exponential ceiling.
// This desynchronizes retries across concurrent callers.
func FullJitter(attempt int) time.Duration {
	if attempt <= 0 {
		return 0
	}

	// Exponential ceiling: base * 2^attempt, capped at maxDelay
	ceiling := float64(baseDelay) * math.Pow(2, float64(attempt))
	if ceiling > float64(maxDelay) {
		ceiling = float64(maxDelay)
	}

	// Uniform random between 0 and ceiling
	jittered := time.Duration(rand.Float64() * ceiling)
	return jittered
}

// RetryWithBackoff executes fn, retrying on error with full jitter backoff.
// Returns the last error after maxRetries attempts.
func RetryWithBackoff(fn func() error) error {
	var lastErr error

	for attempt := 0; attempt < maxRetries; attempt++ {
		lastErr = fn()
		if lastErr == nil {
			return nil
		}

		if attempt == maxRetries-1 {
			break
		}

		delay := FullJitter(attempt + 1)
		time.Sleep(delay)
	}

	return lastErr
}

// RetryableError wraps an error to indicate it is safe to retry.
type RetryableError struct {
	Cause error
}

func (e *RetryableError) Error() string { return e.Cause.Error() }
func (e *RetryableError) Unwrap() error { return e.Cause }

// RetryWithBackoffSelective retries only RetryableError types.
// Non-retryable errors return immediately.
func RetryWithBackoffSelective(fn func() error) error {
	var lastErr error

	for attempt := 0; attempt < maxRetries; attempt++ {
		lastErr = fn()
		if lastErr == nil {
			return nil
		}

		var retryable *RetryableError
		if !errors.As(lastErr, &retryable) {
			// Non-retryable error — fail immediately, no delay
			return lastErr
		}

		if attempt == maxRetries-1 {
			break
		}

		delay := FullJitter(attempt + 1)
		time.Sleep(delay)
	}

	return lastErr
}

The key difference: FullJitter(attempt) returns a value between 0 and baseDelay * 2^attempt, not exactly baseDelay * 2^attempt. At attempt 5 with a 100ms base delay, the ceiling is 3.2 seconds, and the actual wait is somewhere between 0 and 3.2 seconds — uniformly distributed.

At 1,000 consumers all hitting retry attempt 5 simultaneously, they spread their retries across a 3.2-second window instead of all retrying at the exact 3.2-second mark. The downstream system sees at most ~312 requests/second instead of 1,000 requests in a burst.

Visibility Timeout Tuning

The visibility timeout is how long SQS hides a message from other consumers after it has been received. If your worker does not delete the message before the timeout expires, SQS assumes the worker failed and makes the message available again.

Too short: message re-enters queue before processing completes. Your worker processes it, tries to delete it, gets a MessageNotVisible error (the message was already re-queued and possibly picked up by another worker). You now have duplicate processing and double the delete API calls.

Too long: a worker crashes mid-processing. The message is invisible for the full timeout before it is retried. A 12-hour visibility timeout with a crashing consumer means 12-hour delayed retries.

Formula: visibility_timeout = max_expected_processing_time × 1.5

For typical web API calls or database operations (processing in under 30 seconds):

  • Set visibility timeout to 45–60 seconds.

For jobs that call external APIs with long timeouts (60 second external call):

  • Set visibility timeout to 90–120 seconds.

For long-running jobs (report generation, video processing):

  • Set a moderate initial visibility timeout (5 minutes).
  • Extend it mid-processing using ChangeMessageVisibility before it expires.

Extending Visibility Mid-Processing

import boto3
import threading
import time

sqs = boto3.client('sqs', region_name='us-east-1')

def process_with_visibility_extension(queue_url: str, message: dict, processor_fn) -> None:
    receipt_handle = message['ReceiptHandle']
    extend_interval = 30  # Extend every 30 seconds
    extension_amount = 60  # Extend by 60 seconds each time

    stop_event = threading.Event()

    def extend_visibility() -> None:
        while not stop_event.wait(extend_interval):
            try:
                sqs.change_message_visibility(
                    QueueUrl=queue_url,
                    ReceiptHandle=receipt_handle,
                    VisibilityTimeout=extension_amount,
                )
            except Exception:
                # Message may have already been deleted; ignore
                pass

    extender = threading.Thread(target=extend_visibility, daemon=True)
    extender.start()

    try:
        processor_fn(message)
        sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=receipt_handle)
    finally:
        stop_event.set()

This pattern is particularly important for jobs where processing time varies widely — report generation that takes 10 seconds on a small dataset and 90 seconds on a large one.

Worker Concurrency Caps

Uncapped worker concurrency is the most common cause of SQS cost explosions during incidents. When a queue has a large backlog, autoscaling launches many workers. Each worker polls independently. Without a cap, you have no upper bound on API call volume.

Node.js (BullMQ) with Concurrency Cap and Backoff

BullMQ is a Redis-backed queue library for Node.js with excellent concurrency controls. When used with SQS via a bridge pattern or directly, concurrency settings prevent runaway worker scaling:

import { Worker, Job } from 'bullmq';
import IORedis from 'ioredis';

const connection = new IORedis({
  host: process.env.REDIS_HOST,
  port: 6379,
  maxRetriesPerRequest: null,
});

const worker = new Worker(
  'order-processing',
  async (job: Job) => {
    const { orderId, customerId } = job.data;

    // Process the order
    await processOrder(orderId, customerId);
  },
  {
    connection,
    concurrency: 10, // Maximum 10 concurrent jobs per worker process
    limiter: {
      max: 50,        // Maximum 50 jobs processed per duration window
      duration: 1000, // Duration in milliseconds (1 second)
    },
  },
);

// Graceful shutdown on SIGTERM
const shutdown = async () => {
  await worker.close();
  connection.disconnect();
  process.exit(0);
};

process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);

worker.on('failed', (job, err) => {
  console.error(`Job ${job?.id} failed:`, err.message);
});

worker.on('error', (err) => {
  console.error('Worker error:', err);
});

The concurrency: 10 limit means this worker process handles at most 10 jobs simultaneously regardless of how many jobs are queued. The limiter adds a rate limit on top: at most 50 jobs per second across all workers sharing this Redis queue.

If you deploy 5 worker containers, you have 50 total concurrent jobs and 250 jobs/second throughput — predictable, budgetable, and controllable.

Laravel Queue Worker with Concurrency Controls

Laravel’s built-in queue:work command has several flags that directly control cost:

# Controlled worker command — use this in production
php artisan queue:work sqs \
  --queue=orders,notifications \
  --sleep=3 \           # Sleep 3s when queue is empty (reduces empty polls)
  --tries=5 \           # Max attempts before DLQ
  --max-time=3600 \     # Restart worker after 1 hour (memory leak prevention)
  --max-jobs=500 \      # Process 500 jobs then restart (memory hygiene)
  --timeout=60 \        # Kill job after 60s (matches visibility timeout)
  --backoff=30,60,120   # Retry delays: 30s, 60s, 120s (exponential steps)

The critical flags for cost control:

  • --sleep=3: When the queue is empty, wait 3 seconds before polling again instead of immediate re-poll. With 50 workers × 20 polls/minute (sleep=3) vs 60 polls/minute (no sleep): 1,000 fewer API calls/minute.
  • --max-jobs=500: Restart after 500 jobs to prevent memory leaks from accumulating across long-running workers. Combine with a process manager (Supervisor, ECS) that restarts workers automatically.
  • --timeout=60: This must be less than the SQS visibility timeout. If the job runs for longer than --timeout, Laravel kills it, the visibility timeout expires, and SQS redelivers the message. Set --timeout to visibility_timeout - 10 seconds as a safety margin.

For per-connection queue configuration in config/queue.php:

'sqs' => [
    'driver' => 'sqs',
    'key' => env('AWS_ACCESS_KEY_ID'),
    'secret' => env('AWS_SECRET_ACCESS_KEY'),
    'prefix' => env('SQS_PREFIX', 'https://sqs.us-east-1.amazonaws.com/your-account-id'),
    'queue' => env('SQS_QUEUE', 'default'),
    'suffix' => env('SQS_SUFFIX'),
    'region' => env('AWS_DEFAULT_REGION', 'us-east-1'),
    // Long polling: wait up to 20 seconds for messages (reduces empty polls)
    'after_commit' => false,
],

For the SQS queue itself, set ReceiveMessageWaitTimeSeconds = 20 to enable long polling at the queue level. This applies to all consumers regardless of SDK configuration.

Celery Worker Concurrency

# Controlled Celery worker
celery -A myapp worker \
  --concurrency=8 \          # 8 worker processes (set to CPU count for CPU-bound)
  --max-tasks-per-child=100 \ # Restart worker after 100 tasks (memory hygiene)
  --prefetch-multiplier=1 \   # Fetch 1 task per worker (prevents hoarding)
  -Q orders,notifications

--prefetch-multiplier=1 is critical. Default is 4: each worker prefetches 4 tasks from the queue. With 20 workers and prefetch=4, you have 80 messages hidden from other consumers (NotVisible) while only 20 are being actively processed. During a backlog, this looks like consumption but is actually just hiding messages. Set prefetch to 1 for fair distribution and accurate queue depth metrics.

Observability: Detecting Cost Spikes Before They Appear on Your Bill

The AWS Cost Explorer shows costs with 24+ hour delay. By the time a retry storm appears in your bill, you have paid for it. Build alerting on leading indicators.

The Retry Storm Signature

Monitor ApproximateNumberOfMessagesNotVisible (in-flight) vs ApproximateNumberOfMessagesVisible (waiting). Publish this ratio as a custom CloudWatch metric:

import boto3
import time

cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')
sqs = boto3.client('sqs', region_name='us-east-1')

def publish_queue_health_metrics(queue_url: str, queue_name: str) -> None:
    attrs = sqs.get_queue_attributes(
        QueueUrl=queue_url,
        AttributeNames=[
            'ApproximateNumberOfMessages',
            'ApproximateNumberOfMessagesNotVisible',
        ],
    )['Attributes']

    visible = int(attrs.get('ApproximateNumberOfMessages', 0))
    in_flight = int(attrs.get('ApproximateNumberOfMessagesNotVisible', 0))

    # Ratio > 2 suggests workers are consuming but not completing
    ratio = in_flight / max(visible, 1)

    cloudwatch.put_metric_data(
        Namespace='CustomQueues',
        MetricData=[
            {
                'MetricName': 'InFlightToVisibleRatio',
                'Dimensions': [{'Name': 'QueueName', 'Value': queue_name}],
                'Value': ratio,
                'Unit': 'None',
            },
            {
                'MetricName': 'MessagesInFlight',
                'Dimensions': [{'Name': 'QueueName', 'Value': queue_name}],
                'Value': in_flight,
                'Unit': 'Count',
            },
        ],
    )

Run this on a 1-minute schedule via Lambda or a sidecar container. Create a CloudWatch alarm when InFlightToVisibleRatio exceeds 2.0 for 3 consecutive minutes — this is the retry storm signature.

Terraform: CloudWatch Alarm for DLQ Depth

variable "sns_alert_topic_arn" {
  description = "SNS topic ARN for alerting"
  type        = string
}

resource "aws_sqs_queue" "orders_dlq" {
  name                      = "orders-dlq"
  message_retention_seconds = 1209600
}

resource "aws_sqs_queue" "orders" {
  name                       = "orders"
  visibility_timeout_seconds = 60
  receive_wait_time_seconds  = 20

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.orders_dlq.arn
    maxReceiveCount     = 5
  })
}

# Alert when ANY message lands in DLQ
resource "aws_cloudwatch_metric_alarm" "orders_dlq_not_empty" {
  alarm_name          = "orders-dlq-not-empty"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "ApproximateNumberOfMessagesVisible"
  namespace           = "AWS/SQS"
  period              = 300
  statistic           = "Sum"
  threshold           = 0
  alarm_description   = "Messages in orders DLQ — processing failures require investigation"
  alarm_actions       = [var.sns_alert_topic_arn]
  ok_actions          = [var.sns_alert_topic_arn]
  treat_missing_data  = "notBreaching"

  dimensions = {
    QueueName = aws_sqs_queue.orders_dlq.name
  }
}

# Alert on high queue depth (consumer lag indicator)
resource "aws_cloudwatch_metric_alarm" "orders_consumer_lag" {
  alarm_name          = "orders-consumer-lag-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 5
  metric_name         = "ApproximateNumberOfMessagesVisible"
  namespace           = "AWS/SQS"
  period              = 60
  statistic           = "Average"
  threshold           = 500
  alarm_description   = "Order queue backlog above 500 — check consumer health or scale workers"
  alarm_actions       = [var.sns_alert_topic_arn]
  treat_missing_data  = "notBreaching"

  dimensions = {
    QueueName = aws_sqs_queue.orders.name
  }
}

# Alert on high in-flight count relative to visible
# (requires custom metric published by monitoring Lambda)
resource "aws_cloudwatch_metric_alarm" "orders_retry_storm_indicator" {
  alarm_name          = "orders-retry-storm-indicator"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "InFlightToVisibleRatio"
  namespace           = "CustomQueues"
  period              = 60
  statistic           = "Average"
  threshold           = 2
  alarm_description   = "Orders queue in-flight ratio > 2: possible retry storm"
  alarm_actions       = [var.sns_alert_topic_arn]
  treat_missing_data  = "notBreaching"

  dimensions = {
    QueueName = "orders"
  }
}

Circuit Breakers for Queue Workers

A circuit breaker prevents workers from hammering a broken dependency. Without one, a database outage causes every worker to attempt the database call, fail, release the message back to the queue, and immediately try again — burning SQS API calls while the database is trying to recover.

The circuit breaker pattern:

  1. Closed state: normal operation. Workers process messages as usual.
  2. Open state: too many failures detected. Workers skip processing immediately and release messages without attempting the failing operation. The circuit stays open for a cooldown period.
  3. Half-open state: after cooldown, allow a few test requests. If they succeed, close the circuit. If they fail, reopen.

For queue workers, the circuit breaker should release messages back to the queue (rather than deleting them) when the circuit is open, so they can be processed after recovery.

import time
from dataclasses import dataclass, field
from threading import Lock
from enum import Enum, auto


class CircuitState(Enum):
    CLOSED = auto()
    OPEN = auto()
    HALF_OPEN = auto()


@dataclass
class CircuitBreaker:
    failure_threshold: int = 5
    recovery_timeout: float = 30.0
    test_request_limit: int = 1

    _state: CircuitState = field(default=CircuitState.CLOSED, init=False)
    _failure_count: int = field(default=0, init=False)
    _last_failure_time: float = field(default=0.0, init=False)
    _test_requests: int = field(default=0, init=False)
    _lock: Lock = field(default_factory=Lock, init=False)

    def call(self, fn):
        with self._lock:
            if self._state == CircuitState.OPEN:
                if time.monotonic() - self._last_failure_time > self.recovery_timeout:
                    self._state = CircuitState.HALF_OPEN
                    self._test_requests = 0
                else:
                    raise CircuitOpenError("Circuit is OPEN — skipping call")

        try:
            result = fn()
            self._on_success()
            return result
        except Exception as exc:
            self._on_failure()
            raise exc

    def _on_success(self) -> None:
        with self._lock:
            self._failure_count = 0
            self._state = CircuitState.CLOSED

    def _on_failure(self) -> None:
        with self._lock:
            self._failure_count += 1
            self._last_failure_time = time.monotonic()
            if self._failure_count >= self.failure_threshold:
                self._state = CircuitState.OPEN


class CircuitOpenError(Exception):
    pass

In your queue worker, catch CircuitOpenError and release the message back to the queue by not deleting it. This stops the retry storm at the application level without relying solely on SQS visibility timeouts.

The Backlog Explosion Scenario

You deploy a new version of your order processor with a bug. 100,000 messages back up in the queue over 4 hours before you roll back. The rollback fixes the bug. Now you have 100,000 messages and a fresh fleet of workers eager to process them.

Without rate limiting, all 100,000 messages are processed simultaneously. Your database gets hit with 10,000 concurrent queries. The database slows down. Workers time out. Messages return to the queue. Retry storm begins on top of the backlog.

Prevention:

  1. Rate-limit processing during recovery. A Redis-backed rate limiter shared across all workers: max 500 orders/second regardless of worker count.
  2. Ramp up workers gradually rather than launching the full fleet immediately. Use step scaling with a 5-minute cooldown between scale steps.
  3. Use DelaySeconds on new messages during recovery to spread load across time. This only affects new messages, not existing backlog — but it prevents the live traffic from piling onto the recovery effort.
  4. Monitor your downstream dependency (database, external API) during backlog replay. Have runbooks for reducing worker count if downstream shows stress.

Edge Cases That Will Catch You

The visibility timeout race condition: Your worker extends the visibility timeout successfully, processes the message, and calls DeleteMessage. But between the visibility extension and DeleteMessage, a different worker received the message (SQS delivered it twice — this is “at-least-once”). Now you have two workers that think they own the same message, and one of them will get a ReceiptHandleIsInvalid error when it tries to delete. Solution: implement idempotency (see the companion post on AWS SQS Reliable Messaging Patterns for Production).

The Lambda cold start avalanche: A large SQS backlog with Lambda consumers scales Lambda concurrency rapidly. If your Lambda function takes 2 seconds to cold-start and you have 500 concurrent invocations, those 500 cold starts hit your downstream dependency simultaneously. Use provisioned concurrency for time-sensitive queues, or set a lower reserved concurrency to cap the scale velocity.

The FIFO throughput cap: SQS FIFO standard mode caps at 300 transactions/sec per API action. If you hit this limit, messages back up silently — no error is thrown, messages just wait. High-throughput FIFO mode raises this to 9,000 transactions/sec but requires explicit opt-in and changes the pricing. Monitor NumberOfMessagesSent for FIFO queues — if it plateaus well below your expected production rate, you may be hitting the throughput cap.

Related reading: AWS SQS Reliable Messaging Patterns for Production and AWS Cost Control Architecture Optimization Playbook.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »
How Startups Accidentally Burn $100k/month

How Startups Accidentally Burn $100k/month

The most expensive AWS bills do not come from large-scale systems under heavy load. They come from small systems with invisible failure modes: infinite retry loops, misconfigured queues, forgotten resources, and traffic patterns nobody anticipated.

Autoscaling Broke Your Budget (AI Made It Worse)

Autoscaling Broke Your Budget (AI Made It Worse)

Autoscaling was supposed to make costs predictable by matching capacity to demand. Instead, it introduced feedback loops, burst amplification, and — with AI workloads — a new class of non-deterministic spend that no scaling policy anticipates.

Logging Yourself Into Bankruptcy

Logging Yourself Into Bankruptcy

Observability is not free, and the industry has collectively underpriced it. CloudWatch log ingestion, metrics explosion, and X-Ray trace volume can together exceed your compute bill — especially once AI workloads introduce high-cardinality telemetry at scale.

Cost Control Is Architecture, Not Discounts

Cost Control Is Architecture, Not Discounts

Savings Plans and Reserved Instances reduce the rate you pay. Architecture determines the volume you pay at. The most durable cost reductions in AWS come from designing systems that structurally generate less spend — not from negotiating a lower price for the same behavior.