AWS Glossary
AWS Step Functions
Serverless workflow orchestration service for coordinating distributed applications and multi-step processes using visual state machines.
AI & assistant-friendly summary
This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.
Summary
Serverless workflow orchestration service for coordinating distributed applications and multi-step processes using visual state machines.
Key Facts
- • Serverless workflow orchestration service for coordinating distributed applications and multi-step processes using visual state machines
- • Definition AWS Step Functions is a serverless workflow orchestration service that lets you coordinate multiple AWS services into visual state machines
- • It is used for order processing, data pipelines, ML training jobs, microservice coordination, and increasingly as the orchestration layer for AI agents
- • How Step Functions Works A Step Functions workflow is a **state machine** defined in Amazon States Language (JSON/YAML)
- • ServiceException"], "IntervalSeconds": 2, "MaxAttempts": 3, "BackoffRate": 2 }], "Catch": [{ "ErrorEquals": ["States
Entity Definitions
- AWS Bedrock
- AWS Bedrock is an AWS service relevant to aws step functions.
- Amazon Bedrock
- Amazon Bedrock is an AWS service relevant to aws step functions.
- Bedrock
- Bedrock is an AWS service relevant to aws step functions.
- SageMaker
- SageMaker is an AWS service relevant to aws step functions.
- Lambda
- Lambda is an AWS service relevant to aws step functions.
- AWS Lambda
- AWS Lambda is an AWS service relevant to aws step functions.
- S3
- S3 is an AWS service relevant to aws step functions.
- DynamoDB
- DynamoDB is an AWS service relevant to aws step functions.
- CloudWatch
- CloudWatch is an AWS service relevant to aws step functions.
- ECS
- ECS is an AWS service relevant to aws step functions.
- Step Functions
- Step Functions is an AWS service relevant to aws step functions.
- EventBridge
- EventBridge is an AWS service relevant to aws step functions.
- Amazon EventBridge
- Amazon EventBridge is an AWS service relevant to aws step functions.
- SQS
- SQS is an AWS service relevant to aws step functions.
- SNS
- SNS is an AWS service relevant to aws step functions.
Related Content
- AWS SERVERLESS — Related service
- GENERATIVE AI ON AWS — Related service
Definition
AWS Step Functions is a serverless workflow orchestration service that lets you coordinate multiple AWS services into visual state machines. Instead of writing complex retry logic, error handling, and state management into your Lambda functions, Step Functions handles orchestration — your code focuses on business logic, and Step Functions manages the workflow. It is used for order processing, data pipelines, ML training jobs, microservice coordination, and increasingly as the orchestration layer for AI agents.
How Step Functions Works
A Step Functions workflow is a state machine defined in Amazon States Language (JSON/YAML). Each state represents a step in your workflow:
- Task: Call an AWS service or Lambda function
- Choice: Branch based on input conditions
- Wait: Pause for a fixed duration or until a timestamp
- Parallel: Execute multiple branches simultaneously
- Map: Iterate over an array, processing each item in parallel
- Pass: Transform or relay data without calling a service
- Succeed / Fail: Terminal states
Workflow Types
Standard Workflows
- Execute once, auditable — complete execution history stored for 90 days
- Exactly-once execution semantics
- Maximum duration: 1 year
- Supports waitForTaskToken (human approval, external callbacks)
- Best for: long-running workflows, processes requiring audit trails, human-in-the-loop
Express Workflows
- High-throughput, short-duration workflows (up to 5 minutes)
- At-least-once execution (idempotent workloads)
- Up to 100,000 executions/second
- Logs to CloudWatch Logs (not stored in Step Functions)
- Best for: high-volume event processing, streaming data processing, IoT pipelines
SDK Integrations (Optimized Integrations)
Step Functions integrates directly with 220+ AWS services without writing Lambda code:
- Call Lambda, ECS, Fargate, DynamoDB, S3, SQS, SNS, Glue, EMR, Bedrock, SageMaker, and more
- Request-Response: Call service and move to next state immediately
- Sync: Wait for service to complete before proceeding (e.g., wait for Glue job to finish)
- waitForTaskToken: Pause indefinitely until an external system sends a callback token (human approval, third-party integrations)
AI Agent Orchestration
Step Functions is increasingly used as the orchestration layer for multi-step AI agent workflows:
- Coordinate Bedrock model invocations, Knowledge Base queries, Lambda tool calls, and human review steps
- Error handling and retry logic built in — no custom retry code in Lambda
- Audit trail of every agent decision and action
- Combine with Amazon Bedrock Agents for complex multi-agent systems
Error Handling
Step Functions provides declarative error handling:
"Retry": [{
"ErrorEquals": ["Lambda.ServiceException"],
"IntervalSeconds": 2,
"MaxAttempts": 3,
"BackoffRate": 2
}],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "HandleError"
}]- Retry: Automatic retries with exponential backoff per error type
- Catch: Route to error-handling state on failure
- Eliminates try/catch boilerplate from Lambda functions
Step Functions vs Lambda Durable Functions
Both solve long-running workflow orchestration, but differ in approach:
| Aspect | Step Functions | Lambda Durable Functions |
|---|---|---|
| Workflow definition | Visual state machine (ASL) | Code-centric (imperative) |
| Visibility | Visual workflow console | CloudWatch logs |
| Ecosystem | 220+ SDK integrations | Lambda-native |
| Team fit | Low-code / ops teams | Developer-centric |
| Max duration | 1 year | 1 year |
Common Mistakes
Mistake 1: Putting all logic inside Lambda functions when Step Functions handles it. Error handling, retries, parallel execution, and wait states should live in the state machine definition — not in Lambda code.
Mistake 2: Using Standard Workflows for high-volume, short-duration workloads. Standard Workflows charge per state transition (expensive at scale); Express Workflows are 1,000x cheaper for high-throughput scenarios.
Mistake 3: Not using waitForTaskToken for human approval steps. Without it, you need a polling loop; waitForTaskToken pauses the workflow indefinitely until your approval system sends the token back.
Related AWS Services
- AWS Lambda: Most common task executor in Step Functions workflows
- Amazon Bedrock: Foundation model calls and agent orchestration within state machines
- Amazon EventBridge: Trigger Step Functions executions from events
- AWS Batch: Long-running batch jobs coordinated by Step Functions
Related FactualMinds Content
Need Help with This Topic?
Our AWS experts can help you implement and optimize these concepts for your organization.
