Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Bedrock AgentCore is metered across twelve distinct components — Runtime, Browser, Code Interpreter, Gateway, Identity, Memory (two tiers), Observability, Evaluations, Payments, Search, and the underlying model spend. Two of them drive 80% of the bill.

Key Facts

  • Two of them drive 80% of the bill
  • astro'; Amazon Bedrock AgentCore is the production runtime that wraps the Bedrock Agents API with persistent memory, managed tool execution, and observability
  • 15–1
  • AgentCore went generally available with a metered-component pricing model that AWS has been incrementally adjusting through 2026
  • The 12 Metered Components Bedrock AgentCore bills across twelve distinct lines

Entity Definitions

AWS Bedrock
AWS Bedrock is an AWS service discussed in this article.
Amazon Bedrock
Amazon Bedrock is an AWS service discussed in this article.
Bedrock
Bedrock is an AWS service discussed in this article.
Lambda
Lambda is an AWS service discussed in this article.
DynamoDB
DynamoDB is an AWS service discussed in this article.
CloudWatch
CloudWatch is an AWS service discussed in this article.
Step Functions
Step Functions is an AWS service discussed in this article.
CI/CD
CI/CD is a cloud computing concept discussed in this article.

Amazon Bedrock AgentCore Pricing: The 12 Components Behind Your Agent Bill

Quick summary: Bedrock AgentCore is metered across twelve distinct components — Runtime, Browser, Code Interpreter, Gateway, Identity, Memory (two tiers), Observability, Evaluations, Payments, Search, and the underlying model spend. Two of them drive 80% of the bill.

Key Takeaways

  • Two of them drive 80% of the bill
  • astro'; Amazon Bedrock AgentCore is the production runtime that wraps the Bedrock Agents API with persistent memory, managed tool execution, and observability
  • 15–1
  • AgentCore went generally available with a metered-component pricing model that AWS has been incrementally adjusting through 2026
  • The 12 Metered Components Bedrock AgentCore bills across twelve distinct lines
Amazon Bedrock AgentCore Pricing: The 12 Components Behind Your Agent Bill
Table of Contents

Amazon Bedrock AgentCore is the production runtime that wraps the Bedrock Agents API with persistent memory, managed tool execution, and observability. The architecture story is covered in our AgentCore production guide. This post is the bill story — twelve distinct metered components, what each one charges for, which two of them quietly dominate the invoice, and how to model the spend before the first invocation.

AgentCore went generally available with a metered-component pricing model that AWS has been incrementally adjusting through 2026. The pricing page reflects the current rates per region; the structure below is what stays stable across rate changes.

The 12 Metered Components

Bedrock AgentCore bills across twelve distinct lines. Eleven are AgentCore-specific; the twelfth — the underlying Bedrock model spend — is what your agent calls into and is metered separately on the Bedrock model invocation rate sheet.

The 12 AgentCore-attributable line items

Prices in us-east-1

Pricing structure as of June 2026. Verify exact unit prices against the AWS Bedrock AgentCore pricing page for your region.

Runtime invocation

Dominant

Every InvokeAgent call is one Runtime invocation

Unit price
Per request (per agent turn)
Example workload
Production agent at 500K invocations/month

Runtime duration

Co-dominant

Browser / Code Interpreter time extends duration

Unit price
Per second of execution
Example workload
Tool-heavy agent averaging 4s/turn

Browser tool

Moderate

Managed Chromium session lifecycle

Unit price
Per minute of browser session
Example workload
Research agent, 8 page loads / 90s per turn

Code Interpreter

Moderate

Sandboxed Python execution environment

Unit price
Per second of sandbox execution
Example workload
Data-analysis agent, 12s sandbox / turn

Gateway invocation

Low–Moderate

Only billed when you route tools via Gateway

Unit price
Per tool call through Gateway
Example workload
Policy-mediated tool routing, 3 calls/turn

Identity (OIDC/SAML)

Low

Only billed when AgentCore handles auth

Unit price
Per authenticated principal-session
Example workload
B2B agent with SSO, 12K sessions/day

Memory — short-term (in-session)

Bundled

Not a separate AgentCore line item

Unit price
Included in Bedrock Agents pricing
Example workload
Conversation history within a session

Memory — long-term (cross-session)

Dominant at scale

DynamoDB-backed; grows without retention policy

Unit price
Per GB-month + RCU/WCU on retrieval
Example workload
500K users × 30 facts × monthly recall

Observability traces

Low (often free tier)

CloudWatch Logs ingestion rate after free tier

Unit price
Per million trace events (free tier)
Example workload
Full step traces on every invocation

Evaluations

Low (CI/CD usage)

Optional automated quality scoring

Unit price
Per evaluation run + judge model spend
Example workload
Nightly regression on 200 prompts

Payments mediation

Use-case specific

Specialized; only for agents that move money

Unit price
Per transaction mediated
Example workload
Commerce agents settling on user behalf

Search (retrieval)

Moderate

Distinct from Knowledge Bases retrieval

Unit price
Per query + result ingest
Example workload
Hybrid retrieval at 200K queries/month

The 12th line — Bedrock model invocation — is billed separately on the per-model token rate sheet and typically dwarfs all AgentCore lines combined.

The Two Lines That Quietly Dominate

Across the production AgentCore deployments we have audited, two components consistently drive 75–85% of the AgentCore-attributable spend. Knowing which two collapses the optimization problem from twelve dimensions to two.

Runtime (invocation + duration)

Every agent turn is at minimum one Runtime invocation. That number is fixed by the product — you cannot reduce the count without reducing user interactions. What you can reduce is the average billed duration per turn. Three factors compound it:

  1. Tool latency — every Browser, Code Interpreter, or external API call extends the agent’s wall-clock time, which is what AgentCore Runtime charges for.
  2. Self-reflection loops — agents configured to re-evaluate their own output add a second model round-trip, doubling duration.
  3. Verbose action groups — Lambda action groups that return large payloads force the agent to process more tokens before responding, extending the turn.

The leverage is in reducing per-turn duration, not per-turn count. Streamlined action groups, tighter system prompts, and disabling self-reflection for low-stakes turns are the levers.

Long-term Memory

Long-term Memory looks cheap per item. It compounds because every user accumulates facts forever in the default configuration, and every new session retrieves the relevant slice. A B2C agent at 500K monthly active users with 30 retained facts per user is storing roughly 15M items by month six, all backed by DynamoDB read units at session-start time.

How the Bill Compounds: A Worked Example

Consider a B2B operations agent: 250K invocations/month, average 3.2s per turn (one Code Interpreter call + one external API call), 8K active users, 20 facts retained per user with 12-month recall. The shape of the bill — relative weight of each line, not absolute dollars — looks like this:

Cost contribution by line (B2B ops agent — 250K invocations/mo)

Prices in indicative

Relative cost share, not absolute dollars. Multiply your model spend by these ratios to get a directional estimate.

Bedrock model spend

100% baseline
Unit price
Token rate × input+output
Example workload
Underlying model invocation

AgentCore Runtime

~15–22% of model spend
Unit price
Invocation + duration
Example workload
250K × 3.2s

AgentCore long-term Memory

~6–10% of model spend
Unit price
GB-month + retrieval RCU
Example workload
8K users × 20 facts × 12mo

Code Interpreter

~3–6% of model spend
Unit price
Sandbox-second
Example workload
~12s per turn × 250K

Observability

~1–2% of model spend
Unit price
Trace events past free tier
Example workload
Full step traces enabled

Identity, Gateway, Payments, Search

~0%
Unit price
Each metered separately
Example workload
Off in this deployment

Indicative ratios from production deployments we have reviewed. Always model your own workload — the multipliers shift significantly with chat-style vs research-style agent traffic.

The pattern repeats across deployment shapes. Runtime + long-term Memory together land at roughly 20–30% on top of the model spend, with the remaining lines individually under 5%. This is why the first optimization passes always focus on Runtime duration and Memory retention — every other line item is rounding error by comparison.

Common Bill Surprises

When AgentCore Is and Is Not the Right Fit

AgentCore is the right runtime when you need persistent state across sessions, audit-grade observability, or managed tool execution with retry and circuit-breaker semantics. It is not the right runtime when your agent is stateless within a session, your tools are already idempotent, and your traffic is intermittent enough that the per-invocation Runtime price exceeds what a Lambda-with-direct-Bedrock-API setup would cost.

Choose AgentCore when state, observability, or tool reliability is non-negotiable; choose direct Bedrock API when the workload is stateless and bursty.

Use when

  • Agent needs cross-session memory of user preferences, project context, or prior decisions
  • Regulated industry — full agent-reasoning audit trail is mandatory
  • Multi-tool agent where tool reliability (retry, circuit breaker, timeout) is operationally critical
  • Browser-based or sandboxed-code workflows where managing the underlying infrastructure is not a core competency
  • B2B agent with SSO requirements where AgentCore Identity simplifies auth

Avoid when

  • Stateless agents that only respond within a single session and never recall prior visits
  • Internal-only agents with <20K invocations/month where Lambda + direct Bedrock InvokeModel is cheaper
  • Workloads where you have already built robust tool execution infrastructure (Step Functions, retry queues) and would duplicate functionality
  • Highly bursty traffic with low monthly volume — Runtime invocation pricing favors steady throughput
  • Workloads where the model spend is so small that AgentCore line items would be a high relative percentage

When in doubt, prototype on the direct Bedrock API first, then migrate to AgentCore once persistent state or observability becomes a hard requirement.

Modeling AgentCore Cost Before You Build

The single highest-leverage pre-build exercise: produce a token-volume estimate and use it as the anchor for everything else.

  1. Anchor on model spend. Estimate input + output tokens per turn × invocations per month × your chosen model’s per-token rate. Use the Bedrock token cost calculator to lock in the number.
  2. Add a Runtime multiplier. For a lean Runtime-and-Memory deployment, multiply the model spend by ~1.15×. For a full-feature deployment with Browser, Code Interpreter, Gateway, Identity, multiply by ~1.4×.
  3. Layer in Memory. Estimate active users × retained facts per user × planned retention months. Apply a small per-GB-month rate on the storage and a per-RCU rate on the retrieval. Memory typically lands at 5–10% of model spend at moderate scale, 10–20% at B2C scale without a retention policy.
  4. Sanity-check observability and evaluations. If you enable full step traces and nightly evaluations, add another 2–4% on top. If you keep traces off and only run weekly evaluations, this rounds to zero.
  5. Add a 20% contingency. AgentCore line items can be adjusted by AWS between your estimate and your production date. Build in headroom.

The output is a defensible monthly cost estimate accurate within ±25% before the first invocation. That accuracy is enough to make the build/buy decision and to commit to a budget envelope.

What This Post Doesn’t Cover

  • Exact unit prices are intentionally omitted because the AgentCore rate sheet changes more often than any blog post can keep pace with — always cross-check the AWS Bedrock AgentCore pricing page.
  • Bedrock model spend itself — see our Bedrock cost optimization guide for token-level optimization.
  • Provisioned throughput vs on-demand for the underlying model — covered separately in Bedrock provisioned throughput vs on-demand.
  • Multi-region replication of long-term Memory — supported but pricing is still evolving; treat anything we wrote here as us-east-1-anchored.

If You Only Do One Thing This Week

Set a TTL on long-term Memory items. 90 days is the right starting default for B2C agents; 180–365 days for B2B agents with intermittent user engagement. This is the single change that prevents the most common AgentCore bill-growth pattern — Memory storage and RCU silently compounding month over month with no user-visible benefit. Add a monthly compaction Lambda that consolidates duplicate facts within each memoryId and the bill stays flat regardless of user growth.

For deeper context on how the Runtime, Memory, and tool execution layers fit together, the AgentCore production architecture guide walks through the same surface from the design side rather than the bill side.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Recommended Reading

Explore All Articles »
6 min

AWS CDK Cost Estimation: Shift FinOps Left Into Pull Requests

Most FinOps reviews happen weeks after infrastructure ships, when the bill arrives. CDK cost estimation flips that — synthesize the stack, walk the resource graph, hit the AWS Pricing API per resource, and post a monthly-cost diff on every pull request. The cost feedback loop drops from weeks to minutes; the failure modes (request volume, token usage, data transfer) are documented up front.