Amazon Bedrock AgentCore Pricing: The 12 Components Behind Your Agent Bill
Quick summary: Bedrock AgentCore is metered across twelve distinct components — Runtime, Browser, Code Interpreter, Gateway, Identity, Memory (two tiers), Observability, Evaluations, Payments, Search, and the underlying model spend. Two of them drive 80% of the bill.
Key Takeaways
- Two of them drive 80% of the bill
- astro'; Amazon Bedrock AgentCore is the production runtime that wraps the Bedrock Agents API with persistent memory, managed tool execution, and observability
- 15–1
- AgentCore went generally available with a metered-component pricing model that AWS has been incrementally adjusting through 2026
- The 12 Metered Components Bedrock AgentCore bills across twelve distinct lines
Table of Contents
Amazon Bedrock AgentCore is the production runtime that wraps the Bedrock Agents API with persistent memory, managed tool execution, and observability. The architecture story is covered in our AgentCore production guide. This post is the bill story — twelve distinct metered components, what each one charges for, which two of them quietly dominate the invoice, and how to model the spend before the first invocation.
AgentCore went generally available with a metered-component pricing model that AWS has been incrementally adjusting through 2026. The pricing page reflects the current rates per region; the structure below is what stays stable across rate changes.
The 12 Metered Components
Bedrock AgentCore bills across twelve distinct lines. Eleven are AgentCore-specific; the twelfth — the underlying Bedrock model spend — is what your agent calls into and is metered separately on the Bedrock model invocation rate sheet.
The 12 AgentCore-attributable line items
Prices in us-east-1
Pricing structure as of June 2026. Verify exact unit prices against the AWS Bedrock AgentCore pricing page for your region.
| Dimension | Unit price | Example workload | Monthly cost |
|---|---|---|---|
| Runtime invocation Every InvokeAgent call is one Runtime invocation | Per request (per agent turn) | Production agent at 500K invocations/month | Dominant |
| Runtime duration Browser / Code Interpreter time extends duration | Per second of execution | Tool-heavy agent averaging 4s/turn | Co-dominant |
| Browser tool Managed Chromium session lifecycle | Per minute of browser session | Research agent, 8 page loads / 90s per turn | Moderate |
| Code Interpreter Sandboxed Python execution environment | Per second of sandbox execution | Data-analysis agent, 12s sandbox / turn | Moderate |
| Gateway invocation Only billed when you route tools via Gateway | Per tool call through Gateway | Policy-mediated tool routing, 3 calls/turn | Low–Moderate |
| Identity (OIDC/SAML) Only billed when AgentCore handles auth | Per authenticated principal-session | B2B agent with SSO, 12K sessions/day | Low |
| Memory — short-term (in-session) Not a separate AgentCore line item | Included in Bedrock Agents pricing | Conversation history within a session | Bundled |
| Memory — long-term (cross-session) DynamoDB-backed; grows without retention policy | Per GB-month + RCU/WCU on retrieval | 500K users × 30 facts × monthly recall | Dominant at scale |
| Observability traces CloudWatch Logs ingestion rate after free tier | Per million trace events (free tier) | Full step traces on every invocation | Low (often free tier) |
| Evaluations Optional automated quality scoring | Per evaluation run + judge model spend | Nightly regression on 200 prompts | Low (CI/CD usage) |
| Payments mediation Specialized; only for agents that move money | Per transaction mediated | Commerce agents settling on user behalf | Use-case specific |
| Search (retrieval) Distinct from Knowledge Bases retrieval | Per query + result ingest | Hybrid retrieval at 200K queries/month | Moderate |
Runtime invocation
DominantEvery InvokeAgent call is one Runtime invocation
- Unit price
- Per request (per agent turn)
- Example workload
- Production agent at 500K invocations/month
Runtime duration
Co-dominantBrowser / Code Interpreter time extends duration
- Unit price
- Per second of execution
- Example workload
- Tool-heavy agent averaging 4s/turn
Browser tool
ModerateManaged Chromium session lifecycle
- Unit price
- Per minute of browser session
- Example workload
- Research agent, 8 page loads / 90s per turn
Code Interpreter
ModerateSandboxed Python execution environment
- Unit price
- Per second of sandbox execution
- Example workload
- Data-analysis agent, 12s sandbox / turn
Gateway invocation
Low–ModerateOnly billed when you route tools via Gateway
- Unit price
- Per tool call through Gateway
- Example workload
- Policy-mediated tool routing, 3 calls/turn
Identity (OIDC/SAML)
LowOnly billed when AgentCore handles auth
- Unit price
- Per authenticated principal-session
- Example workload
- B2B agent with SSO, 12K sessions/day
Memory — short-term (in-session)
BundledNot a separate AgentCore line item
- Unit price
- Included in Bedrock Agents pricing
- Example workload
- Conversation history within a session
Memory — long-term (cross-session)
Dominant at scaleDynamoDB-backed; grows without retention policy
- Unit price
- Per GB-month + RCU/WCU on retrieval
- Example workload
- 500K users × 30 facts × monthly recall
Observability traces
Low (often free tier)CloudWatch Logs ingestion rate after free tier
- Unit price
- Per million trace events (free tier)
- Example workload
- Full step traces on every invocation
Evaluations
Low (CI/CD usage)Optional automated quality scoring
- Unit price
- Per evaluation run + judge model spend
- Example workload
- Nightly regression on 200 prompts
Payments mediation
Use-case specificSpecialized; only for agents that move money
- Unit price
- Per transaction mediated
- Example workload
- Commerce agents settling on user behalf
Search (retrieval)
ModerateDistinct from Knowledge Bases retrieval
- Unit price
- Per query + result ingest
- Example workload
- Hybrid retrieval at 200K queries/month
The 12th line — Bedrock model invocation — is billed separately on the per-model token rate sheet and typically dwarfs all AgentCore lines combined.
The Two Lines That Quietly Dominate
Across the production AgentCore deployments we have audited, two components consistently drive 75–85% of the AgentCore-attributable spend. Knowing which two collapses the optimization problem from twelve dimensions to two.
Runtime (invocation + duration)
Every agent turn is at minimum one Runtime invocation. That number is fixed by the product — you cannot reduce the count without reducing user interactions. What you can reduce is the average billed duration per turn. Three factors compound it:
- Tool latency — every Browser, Code Interpreter, or external API call extends the agent’s wall-clock time, which is what AgentCore Runtime charges for.
- Self-reflection loops — agents configured to re-evaluate their own output add a second model round-trip, doubling duration.
- Verbose action groups — Lambda action groups that return large payloads force the agent to process more tokens before responding, extending the turn.
The leverage is in reducing per-turn duration, not per-turn count. Streamlined action groups, tighter system prompts, and disabling self-reflection for low-stakes turns are the levers.
Long-term Memory
Long-term Memory looks cheap per item. It compounds because every user accumulates facts forever in the default configuration, and every new session retrieves the relevant slice. A B2C agent at 500K monthly active users with 30 retained facts per user is storing roughly 15M items by month six, all backed by DynamoDB read units at session-start time.
How the Bill Compounds: A Worked Example
Consider a B2B operations agent: 250K invocations/month, average 3.2s per turn (one Code Interpreter call + one external API call), 8K active users, 20 facts retained per user with 12-month recall. The shape of the bill — relative weight of each line, not absolute dollars — looks like this:
Cost contribution by line (B2B ops agent — 250K invocations/mo)
Prices in indicative
Relative cost share, not absolute dollars. Multiply your model spend by these ratios to get a directional estimate.
| Dimension | Unit price | Example workload | Monthly cost |
|---|---|---|---|
| Bedrock model spend | Token rate × input+output | Underlying model invocation | 100% baseline |
| AgentCore Runtime | Invocation + duration | 250K × 3.2s | ~15–22% of model spend |
| AgentCore long-term Memory | GB-month + retrieval RCU | 8K users × 20 facts × 12mo | ~6–10% of model spend |
| Code Interpreter | Sandbox-second | ~12s per turn × 250K | ~3–6% of model spend |
| Observability | Trace events past free tier | Full step traces enabled | ~1–2% of model spend |
| Identity, Gateway, Payments, Search | Each metered separately | Off in this deployment | ~0% |
Bedrock model spend
100% baseline- Unit price
- Token rate × input+output
- Example workload
- Underlying model invocation
AgentCore Runtime
~15–22% of model spend- Unit price
- Invocation + duration
- Example workload
- 250K × 3.2s
AgentCore long-term Memory
~6–10% of model spend- Unit price
- GB-month + retrieval RCU
- Example workload
- 8K users × 20 facts × 12mo
Code Interpreter
~3–6% of model spend- Unit price
- Sandbox-second
- Example workload
- ~12s per turn × 250K
Observability
~1–2% of model spend- Unit price
- Trace events past free tier
- Example workload
- Full step traces enabled
Identity, Gateway, Payments, Search
~0%- Unit price
- Each metered separately
- Example workload
- Off in this deployment
Indicative ratios from production deployments we have reviewed. Always model your own workload — the multipliers shift significantly with chat-style vs research-style agent traffic.
The pattern repeats across deployment shapes. Runtime + long-term Memory together land at roughly 20–30% on top of the model spend, with the remaining lines individually under 5%. This is why the first optimization passes always focus on Runtime duration and Memory retention — every other line item is rounding error by comparison.
Common Bill Surprises
When AgentCore Is and Is Not the Right Fit
AgentCore is the right runtime when you need persistent state across sessions, audit-grade observability, or managed tool execution with retry and circuit-breaker semantics. It is not the right runtime when your agent is stateless within a session, your tools are already idempotent, and your traffic is intermittent enough that the per-invocation Runtime price exceeds what a Lambda-with-direct-Bedrock-API setup would cost.
Choose AgentCore when state, observability, or tool reliability is non-negotiable; choose direct Bedrock API when the workload is stateless and bursty.
Use when
- Agent needs cross-session memory of user preferences, project context, or prior decisions
- Regulated industry — full agent-reasoning audit trail is mandatory
- Multi-tool agent where tool reliability (retry, circuit breaker, timeout) is operationally critical
- Browser-based or sandboxed-code workflows where managing the underlying infrastructure is not a core competency
- B2B agent with SSO requirements where AgentCore Identity simplifies auth
Avoid when
- Stateless agents that only respond within a single session and never recall prior visits
- Internal-only agents with <20K invocations/month where Lambda + direct Bedrock InvokeModel is cheaper
- Workloads where you have already built robust tool execution infrastructure (Step Functions, retry queues) and would duplicate functionality
- Highly bursty traffic with low monthly volume — Runtime invocation pricing favors steady throughput
- Workloads where the model spend is so small that AgentCore line items would be a high relative percentage
When in doubt, prototype on the direct Bedrock API first, then migrate to AgentCore once persistent state or observability becomes a hard requirement.
Modeling AgentCore Cost Before You Build
The single highest-leverage pre-build exercise: produce a token-volume estimate and use it as the anchor for everything else.
- Anchor on model spend. Estimate input + output tokens per turn × invocations per month × your chosen model’s per-token rate. Use the Bedrock token cost calculator to lock in the number.
- Add a Runtime multiplier. For a lean Runtime-and-Memory deployment, multiply the model spend by ~1.15×. For a full-feature deployment with Browser, Code Interpreter, Gateway, Identity, multiply by ~1.4×.
- Layer in Memory. Estimate active users × retained facts per user × planned retention months. Apply a small per-GB-month rate on the storage and a per-RCU rate on the retrieval. Memory typically lands at 5–10% of model spend at moderate scale, 10–20% at B2C scale without a retention policy.
- Sanity-check observability and evaluations. If you enable full step traces and nightly evaluations, add another 2–4% on top. If you keep traces off and only run weekly evaluations, this rounds to zero.
- Add a 20% contingency. AgentCore line items can be adjusted by AWS between your estimate and your production date. Build in headroom.
The output is a defensible monthly cost estimate accurate within ±25% before the first invocation. That accuracy is enough to make the build/buy decision and to commit to a budget envelope.
What This Post Doesn’t Cover
- Exact unit prices are intentionally omitted because the AgentCore rate sheet changes more often than any blog post can keep pace with — always cross-check the AWS Bedrock AgentCore pricing page.
- Bedrock model spend itself — see our Bedrock cost optimization guide for token-level optimization.
- Provisioned throughput vs on-demand for the underlying model — covered separately in Bedrock provisioned throughput vs on-demand.
- Multi-region replication of long-term Memory — supported but pricing is still evolving; treat anything we wrote here as us-east-1-anchored.
If You Only Do One Thing This Week
Set a TTL on long-term Memory items. 90 days is the right starting default for B2C agents; 180–365 days for B2B agents with intermittent user engagement. This is the single change that prevents the most common AgentCore bill-growth pattern — Memory storage and RCU silently compounding month over month with no user-visible benefit. Add a monthly compaction Lambda that consolidates duplicate facts within each memoryId and the bill stays flat regardless of user growth.
For deeper context on how the Runtime, Memory, and tool execution layers fit together, the AgentCore production architecture guide walks through the same surface from the design side rather than the bill side.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.