---
title: Event-Driven Boundaries on AWS: Async vs Sync, Amazon MSK vs Amazon MQ (RabbitMQ), and When SQS Wins
description: Standard SQS queues sustain nearly unlimited throughput per queue (AWS-documented pattern) while FIFO caps at 300 TPS per API batch without high-throughput mode—your May 2026 architecture review should start from those numbers, not from Kafka slogans.
url: https://www.factualminds.com/blog/aws-event-driven-async-messaging-boundaries/
datePublished: 2026-05-08T00:00:00.000Z
dateModified: 2026-06-10T00:00:00.000Z
author: palaniappan-p
category: Cloud Architecture
tags: aws-sqs, amazon-sns, aws-eventbridge, amazon-msk, rabbitmq, amazon-mq, event-driven, serverless
---

# Event-Driven Boundaries on AWS: Async vs Sync, Amazon MSK vs Amazon MQ (RabbitMQ), and When SQS Wins

> Standard SQS queues sustain nearly unlimited throughput per queue (AWS-documented pattern) while FIFO caps at 300 TPS per API batch without high-throughput mode—your May 2026 architecture review should start from those numbers, not from Kafka slogans.

On **May 8, 2026**, the cheapest mental model for AWS buyers is still: **sync paths spend latency budget for certainty; async paths spend operational complexity for resilience.** AWS documents that standard **SQS** queues target **nearly unlimited throughput** (horizontal scaling behind the service), while **FIFO queues without high-throughput mode** are commonly planned around **300 transactions per second** per API action batch—if your backlog math ignores that, capacity reviews fail in the boring way.

This post maps “**Kafka vs RabbitMQ**” debates to **Amazon MSK** and **Amazon MQ (RabbitMQ engine)**—and shows where **neither** belongs because **SQS** or **EventBridge** already matches the coupling you need.

> **Reproduce this** — Inspect redrive policies and visibility timeouts with [`examples/architecture-blog-2026/event-driven/inspect-sqs-redrive.sh`](https://www.factualminds.com/examples/architecture-blog-2026/event-driven/inspect-sqs-redrive.sh) (AWS CLI v2.25+, `jq` installed).

## Sync vs async: decision sharp edges

**Stay synchronous** when:

- The caller must branch UX on the outcome **now** (form validation, payment authorization synchronous with PCI constraints).
- Total fan-out fits your **API Gateway / ALB timeout** envelope with margin.

**Go asynchronous** when:

- Downstream variance is high (ML inference, partner batch APIs).
- You need **absorption** (spiky producers, throttled consumers).
- You already proved idempotency—duplicate delivery is a _when_, not an _if_.

> **Opinionated take** — If your “event-driven architecture” is still “Lambda calling Lambda synchronously through SDK invocations,” you built a distributed monolith with extra latency. Promote boundaries to **queues or buses**; keep the synchronous graph shallow.

## Kafka (MSK) vs RabbitMQ (Amazon MQ) vs SQS

We already compared **Kinesis Data Streams vs MSK** in depth ([streaming platform guide](/blog/amazon-kinesis-data-streams-vs-msk-which-streaming-platform/)). Use that when bytes-per-second and stream retention dominate.

**Choose Amazon MSK** when:

- You need **Kafka protocol** compatibility (existing clients, Kafka Connect, transactions on brokers where supported).
- Stream log retention and **consumer group** mechanics are first-class requirements.

**Choose Amazon MQ for RabbitMQ** when:

- Teams standardize on **AMQP**, priority queues, or routing semantics that map naturally to Rabbit.
- You are migrating an on-prem broker with minimal client rewrites.

**Choose SQS (+ SNS/EventBridge)** when:

- You need **managed fan-out** without caring about broker protocol details.
- Per-message **DLQ + redrive** solves most failure handling ([SQS production patterns](/blog/aws-sqs-reliable-messaging-patterns-for-production/)).

> **What broke** — A team imported a **Kafka**-shaped “commands” topic into **SQS standard** queues without ordering guarantees, assuming partition keys magically existed. Side effects executed out of order; financial adjustments diverged. Fix: **FIFO** with deduplication keys or stay on MSK where ordering is broker-enforced.

## EventBridge and “domain buses”

EventBridge excels at **discovery-friendly** routing (rules, archival, replay with appropriate controls). Pair it with our [EventBridge architecture patterns](/blog/aws-eventbridge-event-driven-architecture-patterns/) before inventing a bespoke bus topology.

Failure mode: **wildcard rules** that turn a bus into an accidental fan-out grenade—cap concurrency at targets and measure blast radius with [cloud cost alarms](/blog/aws-finops-gap-engineering-cost-ownership/).

## Real-time pipelines: Kinesis + Lambda mental hook

When you need stream processing with Lambda consumers, our [Kinesis + Lambda + DynamoDB pipeline](/blog/real-time-data-pipeline-kinesis-lambda-dynamodb/) illustrates throughput thinking end-to-end—use it as the bridge between “messaging” and “analytics inching.”

## Orchestration: Step Functions as the adult in the room

Long-running sagas with explicit compensation should not live in “while loops around SQS.” Read [Step Functions workflow patterns](/blog/aws-step-functions-workflow-orchestration-patterns/) and model human approvals or external waits as first-class states.

## What This Post Doesn’t Cover

- **MQTT** workloads → IoT Core diverges from MSK/MQ mental models; our IoT MQTT [scaling patterns](/blog/aws-iot-core-mqtt-industrial-workloads/) are the better rabbit hole.
- **Exactly-once** semantics fantasies—AWS services expose _at-least-once_ defaults; design idempotency keys everywhere.

## If You Only Do One Thing

Write the **failure-mode paragraph** on every async handoff: duplicate delivery, poison messages, and visibility timeout mishaps—before you paste architecture diagrams into slide decks.

## What to Do This Week

1. Export SQS attributes (visibility, DLQ, retention) for top 10 queues; fix any missing redrive policies on production workloads processing money.
2. Re-read the MSK vs Kinesis decision guide with your actual **RPS + retention + consumer** list—not the vendor you prefer.
3. Add EventBridge rule cardinality review to your change template (producer team + consumer service + max Lambda concurrency).

For ingress scale interactions (ALB + Lambda), tie back to [ingress, load balancing, and elastic scale](/blog/aws-ingress-scale-and-cold-start/).

## Related reading

- [AWS Cloud Center of Excellence (CCoE): Operating Model, RFCs, and How WAR + FinOps Connect](/blog/aws-cloud-center-of-excellence-operating-model-2026/)
- [AWS for Retail: The Complete Guide for eCommerce Teams](/blog/aws-for-retail-complete-guide/)
- [AWS Global Accelerator vs CloudFront & Route 53 (2026)](/blog/aws-global-accelerator-when-to-use-multiregion/)
- [AWS IoT Greengrass v2: Edge Computing for Factory Floors](/blog/aws-iot-greengrass-v2-edge-computing-factory-floor/)
- [AWS IoT SiteWise Native Anomaly Detection for Predictive Maintenance](/blog/aws-iot-sitewise-native-anomaly-detection-predictive-maintenance/)
- [AWS IoT Solutions: Architecture Patterns for Connected Devices](/blog/aws-iot-solutions-architecture-guide/)
- [AWS IoT TwinMaker: Digital Twin Architecture for Manufacturing](/blog/aws-iot-twinmaker-digital-twin-manufacturing/)
- [AWS Managed Services vs AWS Support Plans: What](/blog/aws-managed-services-vs-aws-support-plans-difference/)
- [AWS Architecture for Black Friday: How Retail Teams Prepare for Peak Traffic](/blog/aws-retail-architecture-black-friday-peak-traffic/)
- [12 Benefits of Hiring a Certified AWS Consultant — With Real ROI](/blog/benefits-of-hiring-certified-aws-consultant/)
- [Custom AWS Development for Retail: When Off-the-Shelf Is Not Enough](/blog/custom-aws-development-retail-ecommerce/)
- [How to Optimize EC2 for High-Performance APIs](/blog/ec2-high-performance-api-optimization/)
- [Microservices Design Patterns on AWS: 10 Patterns That Actually Matter in 2026](/blog/microservices-design-patterns-aws-production-guide-2026/)
- [How to Choose Between Nginx, FrankenPHP, and Modern Web Runtimes (2026)](/blog/nginx-frankenphp-modern-runtimes-comparison/)
- [OPC-UA on AWS: SiteWise Edge Gateway Setup and Best Practices](/blog/opc-ua-aws-iot-sitewise-edge-gateway-setup/)
- [OT/IT Convergence on AWS: Architecture Patterns for Smart Manufacturing](/blog/ot-it-convergence-aws-architecture-patterns/)
- [How to Build Reliable Queue Systems on AWS (SQS, Kafka, Redis)](/blog/reliable-queue-systems-aws-sqs-kafka-redis/)
- [How to Tune PHP, Node.js, Python, and Go for High Concurrency on AWS](/blog/tune-php-node-python-go-high-concurrency/)

## FAQ

### When is synchronous HTTP the right default instead of SQS?
When the caller needs an immediate semantic response (success/failure, generated id, validation errors) and you can keep the operation inside your latency SLO without fragile downstream chains. User-facing mutations that must confirm before navigation should stay request/response unless you invest in idempotent compensating flows plus excellent UX for async completion.

### When should we not choose Amazon MSK even if we already know Kafka?
When you do not need Kafka consumer groups, compacted topics, or Kafka Streams—which is a large share of internal job queues. Paying broker hours for “YAML nostalgia” burns budget. Start from SQS + EventBridge + (if needed) Kinesis or MSK Serverless after writing the one-page decision record.

### What breaks with wrong SQS visibility timeout?
Workers exceed processing time, messages become visible again, and duplicate processing amplifies unless handlers are idempotent. Tune visibility to the p99 handler duration plus padding; use heartbeat patterns for long jobs or Step Functions instead of giant single-message work.

### Does RabbitMQ on Amazon MQ replace Sqs?
Sometimes—for AMQP patterns (priority queues, selective consumer features). It does not magically remove operational thinking: broker maintenance windows, durable queue configuration, and networking still matter. If you do not need AMQP protocol compatibility, SQS often wins on undifferentiated heavy lifting.

### What is the classic EventBridge pitfall?
Unbounded rule fan-out: one bus event triggers thirty lambdas with no concurrency caps, blasting downstream SaaS rate limits. Use event filtering, dead-letter buses, and per-target Lambda reserved concurrency budgets.

### Where does Step Functions fit relative to SQS?
Orchestration with explicit state, timeouts, and compensations belongs in Step Functions. Queue competition and worker pools belong in SQS. Composing both is normal—see our Step Functions patterns post for choreography vs orchestration boundaries.

---

*Source: https://www.factualminds.com/blog/aws-event-driven-async-messaging-boundaries/*
