Workflow Architecture Comparison
AWS Step Functions vs EventBridge: Orchestration vs Choreography
Step Functions orchestrates workflows with full state ownership and guaranteed execution order. EventBridge choreographs loosely coupled services through events. Understanding the difference prevents architectural regret at scale.
Step Functions and EventBridge are frequently mentioned together in AWS architecture discussions — and just as frequently confused. They solve different problems. Using EventBridge where you need Step Functions leads to fragmented error handling and invisible workflow state. Using Step Functions where EventBridge suffices leads to unnecessary coupling and higher costs.
This comparison clarifies the distinction with concrete patterns, cost data, and a framework for deciding when to use each.
The Core Architectural Distinction
Orchestration (Step Functions): A central coordinator knows the entire workflow state and directs each service to perform its step. If a step fails, the coordinator decides whether to retry, compensate, or abort. The coordinator is the single source of truth for the workflow’s current state.
Choreography (EventBridge): Each service listens for events it cares about and reacts independently. No single service knows the overall workflow state. If a downstream service fails, it is responsible for its own retry — there is no central point that knows the order-fulfillment process is stuck.
Neither approach is universally better. The right choice depends on whether coordination guarantees or loose coupling is more important for your specific workflow.
Service Overview
| AWS Step Functions | AWS EventBridge | |
|---|---|---|
| Pattern | Orchestration | Choreography / event routing |
| State ownership | Central (Step Functions owns state) | Distributed (no central state) |
| Execution history | Full step-by-step history retained | Event delivery logs in CloudWatch |
| Retry logic | Built-in per-state retry with backoff | Target-level retry (2 retries by default) |
| Error compensation | Catch/Compensate patterns, saga support | Not built-in |
| Execution order | Guaranteed sequential or parallel | Best-effort, eventual consistency |
| Maximum duration | 1 year (Standard), 5 minutes (Express) | Event delivery (sub-second to minutes) |
| Pricing | $0.025/1,000 state transitions (Standard) | $1.00/million events (default bus) |
| Visibility | Real-time execution graph in console | Event delivery metrics in CloudWatch |
When Step Functions Is the Right Tool
Step Functions is purpose-built for workflows where you need to know the state of a multi-step process at any point in time.
Order fulfillment with compensation: An e-commerce order workflow might involve: charge payment card → reserve inventory → send fulfillment request → send confirmation email. If the fulfillment request fails, the workflow needs to release the inventory reservation and refund the payment charge — a classic saga pattern. Step Functions handles this with Catch states and compensating branches. With EventBridge, you would need to build this compensation logic into each individual service, with no central visibility into which compensating actions have completed.
ETL pipeline with validation gates: A data ingestion pipeline that validates schema → transforms data → loads to data warehouse benefits from Step Functions’ Map state (parallel processing over a list), Wait state (polling for async operations), and a complete execution history showing exactly which records failed validation and why.
Long-running approval workflows: Step Functions’ .waitForTaskToken pattern pauses a workflow indefinitely until a callback token is returned — perfect for human-in-the-loop approval steps that may take hours or days. Standard Workflows can wait up to 1 year.
Compliance-sensitive processes: Industries subject to audit requirements (healthcare, finance) benefit from Step Functions’ execution history, which records every state transition with timestamps. Demonstrating that a specific process ran in the correct sequence on a specific date is straightforward — the execution history is immutable and queryable.
When EventBridge Is the Right Tool
EventBridge shines when services should react independently to things that happened, without needing a coordinator.
Fan-out notifications: When an order is placed, you might want to: send a confirmation email, update the CRM, trigger an analytics event, and notify the warehouse system. These are independent reactions to the same event — none depends on the others, and failure in one should not block the others. EventBridge’s multiple target support makes this a single event rule rather than a sequential workflow.
Domain event broadcasting: Microservices publishing domain events (user.registered, payment.processed, subscription.renewed) to an EventBridge event bus allow downstream services to subscribe without the producer knowing who is consuming. Adding a new consumer requires zero changes to the producer — just a new EventBridge rule.
Scheduled automation: EventBridge Scheduler is the right service for cron-like scheduled triggers (nightly database cleanup, daily report generation, hourly health checks) — it is simpler and cheaper than a Step Functions scheduled execution for single-Lambda invocations.
Cross-service integration: EventBridge’s native integration with 200+ AWS services as event sources means you can react to S3 uploads, RDS database changes, CloudTrail API calls, and third-party SaaS events (Salesforce, Zendesk, GitHub) without writing polling code.
Cost Comparison at Scale
| Scenario | Step Functions (Standard) | EventBridge |
|---|---|---|
| 100K workflow executions, 10 steps each | $25/month | N/A (not applicable) |
| 1M events/month (simple routing) | Overkill — use EventBridge | $1.00/month |
| 10M events/month | Very expensive | $10.00/month |
| 1M executions, 5-step workflow/month | $125/month | N/A |
| 1M high-volume short workflows (Express) | ~$1/million requests + duration | N/A |
Step Functions Express Workflows are cost-competitive with EventBridge for high-volume, short-duration orchestration. The trade-off is that Express Workflows provide at-least-once execution semantics and do not retain execution history — you must send execution results to CloudWatch or S3 yourself.
EventBridge is dramatically cheaper for pure event routing. If your use case is “fire an event and fan out to multiple targets,” EventBridge at $1/million events is the right tool. Using Step Functions for the same pattern would cost 25x more and add unnecessary coordination overhead.
Error Handling: A Critical Difference
Step Functions’ error handling model is its most underappreciated advantage.
Each state in a Step Functions workflow can define:
- Retry configuration: max attempts, backoff rate, jitter, specific error codes to retry
- Catch configuration: route to a different state branch on specific errors
- Compensate patterns: run cleanup states when a later step fails
EventBridge’s error handling is at the target level only. If a Lambda function target fails after 2 retries, the event goes to a dead-letter queue (if configured). There is no concept of compensating a prior step — the producer has already published the event and has no knowledge of the downstream failure.
For workflows where partial completion is unacceptable — financial transactions, order processing, data consistency operations — Step Functions’ error model is a hard requirement.
Hybrid Architecture: Using Both Together
The most sophisticated AWS architectures use Step Functions and EventBridge in complementary roles.
Pattern 1: EventBridge triggers Step Functions
An EventBridge rule listens for order.placed events and starts a Step Functions execution for each order. The workflow orchestrates the multi-step fulfillment logic with full state visibility and retry capabilities, while EventBridge provides the decoupled trigger mechanism.
Pattern 2: Step Functions emits EventBridge events
Within a Step Functions workflow, individual states can publish EventBridge events to notify other services of progress — “order.fulfillment.started,” “order.shipped” — without requiring those services to poll Step Functions or be coupled to the workflow’s structure. The core workflow remains coordinated by Step Functions; the notifications are choreographed by EventBridge.
Pattern 3: EventBridge for notifications, Step Functions for the critical path
A payment processing workflow uses Step Functions for the authoritative transaction sequence (charge → reserve → confirm), while EventBridge handles all downstream notifications (email confirmation, analytics, CRM update). This separates the transactional guarantee requirement from the loose-coupling requirement.
Decision Framework
| Requirement | Step Functions | EventBridge |
|---|---|---|
| Guaranteed execution order | Yes | No |
| Error compensation (saga pattern) | Yes | No |
| Complete execution audit trail | Yes | No |
| Fan-out to multiple independent consumers | No | Yes |
| Loose coupling between services | No | Yes |
| Sub-second event routing | No | Yes |
| Cost-efficient at millions of events/day | No | Yes |
| Long-running workflows (hours/days) | Yes | No |
| Human approval steps | Yes | No |
If your workflow needs guaranteed order, compensation, or a full audit trail, Step Functions is the right choice regardless of cost. If your use case is event routing, fan-out, or decoupled service integration, EventBridge is simpler and significantly cheaper.
Architecting event-driven systems on AWS requires a clear mental model of where orchestration is necessary and where it creates unnecessary coupling. Our AWS architects can review your workflow architecture and recommend the right combination of Step Functions, EventBridge, and supporting services for your specific throughput, consistency, and cost requirements.
Frequently Asked Questions
What is the difference between Step Functions and EventBridge?
Step Functions is an orchestrator — it owns the workflow state, knows what step is executing, can retry failed steps, compensate on failure, and has a full execution history. EventBridge is a choreographer — it routes events from producers to consumers, but no single service owns the overall workflow state. Step Functions is the right tool when you need guaranteed execution order, compensating transactions, or a complete audit trail of every workflow step. EventBridge is the right tool when you want services to react independently to events without being coupled to a central coordinator.
When should I use Step Functions vs EventBridge?
Use Step Functions when: the workflow involves multiple dependent steps that must run in sequence, you need retry logic and error compensation (saga pattern), you need a complete execution history for compliance, or the workflow must complete atomically. Use EventBridge when: services should react independently to things that happened (order placed, user registered, file uploaded), producers should not care who consumes their events, you need fan-out to multiple consumers, or you are building an event-driven microservices architecture where decoupling is more important than coordination guarantees.
Can Step Functions and EventBridge work together?
Yes — combining them is a common and powerful pattern. EventBridge fires an event (e.g., "order.placed"), which triggers a Step Functions execution to orchestrate the multi-step fulfillment workflow. Within the Step Functions workflow, individual steps can publish EventBridge events to notify other services of progress without coupling them to the workflow directly. This gives you the coordination guarantees of Step Functions for the critical path and the loose coupling of EventBridge for notifications and side effects.
What does Step Functions cost?
Step Functions Standard Workflows charge $0.025 per 1,000 state transitions. A 10-step workflow that executes 100,000 times per month costs $25/month in state transitions (10 steps × 100,000 executions × $0.025/1,000 = $25). Express Workflows are cheaper for high-volume, short-duration workflows: $1.00 per million workflow requests plus $0.00001667 per GB-second of duration — better suited for workflows executing millions of times per day. The key cost driver for Standard Workflows is state transition count, so deeply nested parallel branches and Map states can multiply costs significantly.
Is EventBridge suitable for workflow orchestration?
EventBridge is not designed for workflow orchestration and should not be used as a substitute for Step Functions in that role. EventBridge has no concept of workflow state — it does not know if a multi-step process succeeded or failed as a whole, cannot retry a failed downstream service, and provides no compensation mechanism. You can simulate simple sequences with EventBridge by chaining events, but you lose visibility into overall workflow state, error handling becomes fragmented across services, and debugging failures requires reconstructing the event chain from CloudWatch Logs. Use EventBridge for event routing and fan-out; use Step Functions for coordinated multi-step workflows.
Need Help Choosing the Right Cloud Platform?
Our AWS-certified architects help you evaluate cloud platforms based on your specific requirements, workloads, and business goals.
