When is synchronous HTTP the right default instead of SQS?

When the caller needs an immediate semantic response (success/failure, generated id, validation errors) and you can keep the operation inside your latency SLO without fragile downstream chains. User-facing mutations that must confirm before navigation should stay request/response unless you invest in idempotent compensating flows plus excellent UX for async completion.

When should we not choose Amazon MSK even if we already know Kafka?

When you do not need Kafka consumer groups, compacted topics, or Kafka Streams—which is a large share of internal job queues. Paying broker hours for “YAML nostalgia” burns budget. Start from SQS + EventBridge + (if needed) Kinesis or MSK Serverless after writing the one-page decision record.

What breaks with wrong SQS visibility timeout?

Workers exceed processing time, messages become visible again, and duplicate processing amplifies unless handlers are idempotent. Tune visibility to the p99 handler duration plus padding; use heartbeat patterns for long jobs or Step Functions instead of giant single-message work.

Does RabbitMQ on Amazon MQ replace Sqs?

Sometimes—for AMQP patterns (priority queues, selective consumer features). It does not magically remove operational thinking: broker maintenance windows, durable queue configuration, and networking still matter. If you do not need AMQP protocol compatibility, SQS often wins on undifferentiated heavy lifting.

What is the classic EventBridge pitfall?

Unbounded rule fan-out: one bus event triggers thirty lambdas with no concurrency caps, blasting downstream SaaS rate limits. Use event filtering, dead-letter buses, and per-target Lambda reserved concurrency budgets.

Where does Step Functions fit relative to SQS?

Orchestration with explicit state, timeouts, and compensations belongs in Step Functions. Queue competition and worker pools belong in SQS. Composing both is normal—see our Step Functions patterns post for choreography vs orchestration boundaries.

Event-Driven AWS: Async vs Sync, MSK vs MQ RabbitMQ vs SQS

Event-Driven Boundaries on AWS: Async vs Sync, Amazon MSK vs Amazon MQ (RabbitMQ), and When SQS Wins

Quick summary: Standard SQS queues sustain nearly unlimited throughput per queue (AWS-documented pattern) while FIFO caps at 300 TPS per API batch without high-throughput mode—your May 2026 architecture review should start from those numbers, not from Kafka slogans.

Key Takeaways

On May 8, 2026, the cheapest mental model for AWS buyers is still: sync paths spend latency budget for certainty; async paths spend operational complexity for resilience
This post maps “Kafka vs RabbitMQ” debates to Amazon MSK and Amazon MQ (RabbitMQ engine)—and shows where neither belongs because SQS or EventBridge already matches the coupling you need
Reproduce this — Inspect redrive policies and visibility timeouts with (AWS CLI v2
25+, installed)
Sync vs async: decision sharp edges Stay synchronous when: - The caller must branch UX on the outcome now (form validation, payment authorization synchronous with PCI constraints)

On May 8, 2026, the cheapest mental model for AWS buyers is still: sync paths spend latency budget for certainty; async paths spend operational complexity for resilience. AWS documents that standard SQS queues target nearly unlimited throughput (horizontal scaling behind the service), while FIFO queues without high-throughput mode are commonly planned around 300 transactions per second per API action batch—if your backlog math ignores that, capacity reviews fail in the boring way.

This post maps “Kafka vs RabbitMQ” debates to Amazon MSK and Amazon MQ (RabbitMQ engine)—and shows where neither belongs because SQS or EventBridge already matches the coupling you need.

Reproduce this — Inspect redrive policies and visibility timeouts with examples/architecture-blog-2026/event-driven/inspect-sqs-redrive.sh (AWS CLI v2.25+, jq installed).

Sync vs async: decision sharp edges

Stay synchronous when:

The caller must branch UX on the outcome now (form validation, payment authorization synchronous with PCI constraints).
Total fan-out fits your API Gateway / ALB timeout envelope with margin.

Go asynchronous when:

Downstream variance is high (ML inference, partner batch APIs).
You need absorption (spiky producers, throttled consumers).
You already proved idempotency—duplicate delivery is a when, not an if.

Opinionated take — If your “event-driven architecture” is still “Lambda calling Lambda synchronously through SDK invocations,” you built a distributed monolith with extra latency. Promote boundaries to queues or buses; keep the synchronous graph shallow.

Kafka (MSK) vs RabbitMQ (Amazon MQ) vs SQS

We already compared Kinesis Data Streams vs MSK in depth (streaming platform guide). Use that when bytes-per-second and stream retention dominate.

Choose Amazon MSK when:

You need Kafka protocol compatibility (existing clients, Kafka Connect, transactions on brokers where supported).
Stream log retention and consumer group mechanics are first-class requirements.

Choose Amazon MQ for RabbitMQ when:

Teams standardize on AMQP, priority queues, or routing semantics that map naturally to Rabbit.
You are migrating an on-prem broker with minimal client rewrites.

Choose SQS (+ SNS/EventBridge) when:

You need managed fan-out without caring about broker protocol details.
Per-message DLQ + redrive solves most failure handling (SQS production patterns).

What broke — A team imported a Kafka-shaped “commands” topic into SQS standard queues without ordering guarantees, assuming partition keys magically existed. Side effects executed out of order; financial adjustments diverged. Fix: FIFO with deduplication keys or stay on MSK where ordering is broker-enforced.

EventBridge and “domain buses”

EventBridge excels at discovery-friendly routing (rules, archival, replay with appropriate controls). Pair it with our EventBridge architecture patterns before inventing a bespoke bus topology.

Failure mode: wildcard rules that turn a bus into an accidental fan-out grenade—cap concurrency at targets and measure blast radius with cloud cost alarms.

Real-time pipelines: Kinesis + Lambda mental hook

When you need stream processing with Lambda consumers, our Kinesis + Lambda + DynamoDB pipeline illustrates throughput thinking end-to-end—use it as the bridge between “messaging” and “analytics inching.”

Orchestration: Step Functions as the adult in the room

Long-running sagas with explicit compensation should not live in “while loops around SQS.” Read Step Functions workflow patterns and model human approvals or external waits as first-class states.

What This Post Doesn’t Cover

MQTT workloads → IoT Core diverges from MSK/MQ mental models; our IoT MQTT scaling patterns are the better rabbit hole.
Exactly-once semantics fantasies—AWS services expose at-least-once defaults; design idempotency keys everywhere.

If You Only Do One Thing

Write the failure-mode paragraph on every async handoff: duplicate delivery, poison messages, and visibility timeout mishaps—before you paste architecture diagrams into slide decks.

What to Do This Week

Export SQS attributes (visibility, DLQ, retention) for top 10 queues; fix any missing redrive policies on production workloads processing money.
Re-read the MSK vs Kinesis decision guide with your actual RPS + retention + consumer list—not the vendor you prefer.
Add EventBridge rule cardinality review to your change template (producer team + consumer service + max Lambda concurrency).

For ingress scale interactions (ALB + Lambda), tie back to ingress, load balancing, and elastic scale.

Event-Driven Boundaries on AWS: Async vs Sync, Amazon MSK vs Amazon MQ (RabbitMQ), and When SQS Wins

Sync vs async: decision sharp edges

Kafka (MSK) vs RabbitMQ (Amazon MQ) vs SQS

EventBridge and “domain buses”

Real-time pipelines: Kinesis + Lambda mental hook

Orchestration: Step Functions as the adult in the room

What This Post Doesn’t Cover

If You Only Do One Thing

What to Do This Week

Related AWS Services

AWS Architecture Review

AWS Serverless

AWS Migration

Ready to discuss your AWS strategy?

Recommended Reading

How to Build Hybrid Compute (EC2 + Serverless) for Cost Efficiency

Ingress, Load Balancing, and Elastic Scale on AWS: L4 vs L7, Horizontal vs Vertical, and the Cold-Start Bill

Production Resilience on AWS: Timeouts, Retries With Jitter, Circuit Limits, and Graceful Shutdown

Microservices Design Patterns on AWS: 10 Patterns That Actually Matter in 2026

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

Sync vs async: decision sharp edges

Kafka (MSK) vs RabbitMQ (Amazon MQ) vs SQS

EventBridge and “domain buses”

Real-time pipelines: Kinesis + Lambda mental hook

Orchestration: Step Functions as the adult in the room

What This Post Doesn’t Cover

If You Only Do One Thing

What to Do This Week

Related AWS Services

AWS Architecture Review

AWS Serverless

AWS Migration

Ready to discuss your AWS strategy?

Recommended Reading

How to Build Hybrid Compute (EC2 + Serverless) for Cost Efficiency

Ingress, Load Balancing, and Elastic Scale on AWS: L4 vs L7, Horizontal vs Vertical, and the Cold-Start Bill

Production Resilience on AWS: Timeouts, Retries With Jitter, Circuit Limits, and Graceful Shutdown

Microservices Design Patterns on AWS: 10 Patterns That Actually Matter in 2026