Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

import PricingHeroStats from '~/components/blog/PricingHeroStats. astro'; import PricingDimensionTable from '~/components/blog/PricingDimensionTable. astro'; import BillSurpriseCallout from '~/components/blog/BillSurpriseCallout

Key Facts

  • SageMaker AI Savings Plans deliver up to 64% off SageMaker training, real-time inference, async inference, serverless inference, and processing jobs in exchange for 1-year or 3-year hourly commitment
  • Compute Savings Plans do NOT cover SageMaker — this is a separate purchase
  • astro'; AWS SageMaker AI Savings Plans are the product-specific commitment-based discount mechanism for SageMaker compute
  • The most consequential fact about them: Compute Savings Plans do not cover SageMaker workloads
  • Teams with significant SageMaker spend who have only purchased Compute Savings Plans are paying full on-demand rates on every SageMaker line

Entity Definitions

Bedrock
Bedrock is an AWS service discussed in this article.
SageMaker
SageMaker is an AWS service discussed in this article.
Lambda
Lambda is an AWS service discussed in this article.
EC2
EC2 is an AWS service discussed in this article.
S3
S3 is an AWS service discussed in this article.
RDS
RDS is an AWS service discussed in this article.
serverless
serverless is a cloud computing concept discussed in this article.
cost optimization
cost optimization is a cloud computing concept discussed in this article.

AWS SageMaker AI Savings Plans: Up to 64% Off Training and Inference Compute

Quick summary: SageMaker AI Savings Plans deliver up to 64% off SageMaker training, real-time inference, async inference, serverless inference, and processing jobs in exchange for 1-year or 3-year hourly commitment. Compute Savings Plans do NOT cover SageMaker — this is a separate purchase. The break-even is dramatically faster than RI-style commits for steady ML production workloads.

Key Takeaways

  • SageMaker AI Savings Plans deliver up to 64% off SageMaker training, real-time inference, async inference, serverless inference, and processing jobs in exchange for 1-year or 3-year hourly commitment
  • Compute Savings Plans do NOT cover SageMaker — this is a separate purchase
  • astro'; AWS SageMaker AI Savings Plans are the product-specific commitment-based discount mechanism for SageMaker compute
  • The most consequential fact about them: Compute Savings Plans do not cover SageMaker workloads
  • Teams with significant SageMaker spend who have only purchased Compute Savings Plans are paying full on-demand rates on every SageMaker line
AWS SageMaker AI Savings Plans: Up to 64% Off Training and Inference Compute
Table of Contents

AWS SageMaker AI Savings Plans are the product-specific commitment-based discount mechanism for SageMaker compute. They deliver up to 64% off on-demand rates for training, inference (real-time, asynchronous, serverless), notebook instances, and processing jobs — in exchange for a 1-year or 3-year hourly-rate commitment. The most consequential fact about them: Compute Savings Plans do not cover SageMaker workloads. Teams with significant SageMaker spend who have only purchased Compute Savings Plans are paying full on-demand rates on every SageMaker line.

This post focuses on the Savings Plan side of SageMaker cost optimization. For the operational side of SageMaker cost — instance selection, training job sizing, spot training — see our SageMaker training cost-efficiency guide.

What SageMaker AI Savings Plans Cover

SageMaker AI Savings Plans — coverage scope

Prices in us-east-1

The SP applies hour-by-hour to qualifying SageMaker compute. Non-compute SageMaker features (model registry storage, feature store, etc.) bill separately at standard rates.

SageMaker Training

Up to 64% off

Includes distributed training across instances

Unit price
Per ml.* instance-hour
Example workload
ml.p5d.24xlarge training

Real-Time Inference

Up to 64% off

Auto-scaling supported within commit

Unit price
Per ml.* instance-hour
Example workload
Persistent endpoint

Asynchronous Inference

Up to 64% off

Scales to zero when idle

Unit price
Per ml.* instance-hour
Example workload
Queue-based inference

Serverless Inference

Up to 64% off

Cold-start latency trade-off

Unit price
Per memory-hour
Example workload
On-demand model serving

Processing Jobs

Up to 64% off

For pre/post-training data work

Unit price
Per ml.* instance-hour
Example workload
Data prep, model evaluation

Notebook Instances

Up to 64% off

SageMaker Studio Notebooks included

Unit price
Per ml.* instance-hour
Example workload
Persistent dev notebooks

Model registry storage

Not covered by SP

S3 lifecycle for cost control

Unit price
Standard S3 rates
Example workload
Versioned model artifacts

Feature store online storage

Not covered by SP

Bills separately

Unit price
Per GB-month
Example workload
Real-time feature serving

The SP coverage is across all compute primitives — flexibility to shift between training and inference without losing the discount.

The Two-Plan Trap

The single most common SageMaker cost mistake: assuming Compute Savings Plans cover SageMaker. They don’t.

The Commitment Mechanism

SageMaker AI Savings Plans commit to a dollar amount per hour for the chosen term (1-year or 3-year) with three payment options (All-Upfront, Partial-Upfront, No-Upfront). Higher upfront delivers higher discount; No-Upfront preserves cash at slightly lower discount.

Key flexibility: the commitment is in dollars per hour, not instance types. Commit $10/hour and AWS applies that to any qualifying SageMaker compute usage up to $10/hour, then bills on-demand for usage above. You can shift between training instance types, change inference endpoint configurations, or move between real-time and async inference without losing SP coverage.

Savings Plan tier comparison — illustrative 1-year and 3-year commits

Prices in us-east-1

Higher commitment terms deliver larger discounts. The trade-off is reduced flexibility to abandon the commitment.

No commitment (on-demand)

$7,300

Full flexibility; no discount

Unit price
$10/hr SageMaker spend
Example workload
$7,300/month baseline

1-year No-Upfront

~$5,110

Preserves cash; lowest discount tier

Unit price
~30% discount
Example workload
Same workload

1-year Partial-Upfront

~$4,745

Better discount; partial cash commit

Unit price
~35% discount
Example workload
Same workload

1-year All-Upfront

~$4,526

Maximum 1-year discount

Unit price
~38% discount
Example workload
Same workload

3-year All-Upfront

~$2,628

Maximum discount; longest commitment

Unit price
Up to 64% discount
Example workload
Same workload, 3-year commit

Exact discount percentages vary by instance type. Newer GPU families (ml.p5d, ml.p6e) typically have slightly less aggressive discount tiers.

Commit After, Not Before

The right pattern for purchasing SageMaker AI Savings Plans:

  1. Deploy the workload on on-demand for 60–90 days.
  2. Measure steady-state hourly SageMaker spend via Cost Explorer with hourly granularity.
  3. Commit to roughly 80% of the observed steady-state rate — leave 20% headroom for growth and variability.
  4. Re-evaluate quarterly. As the workload grows, layer additional Savings Plans on top of the existing commitment.

The wrong pattern: committing before the workload is stable. SP commitments are obligations to pay for the committed hourly rate whether you use it or not. Over-committing on a workload that turns out to use less than expected wastes the commitment.

How the SP Stack Applies

When you have multiple Savings Plans, AWS applies them in priority order:

  1. Instance-specific Savings Plans (EC2) — first to apply for matching EC2 usage.
  2. SageMaker AI Savings Plans — apply to SageMaker usage.
  3. Compute Savings Plans — apply to remaining qualifying EC2, Fargate, Lambda.
  4. On-demand — billing for any usage above the combined plan coverage.

The implication: SageMaker AI SPs and Compute SPs cover different scopes and stack additively. An organization with significant EC2 + Fargate + SageMaker spend should consider both products.

When to Commit and When to Stay On-Demand

Commit on stable production workloads; stay on-demand for variable / new / declining workloads.

Use when

  • Steady production inference endpoints with predictable traffic over a 12+ month horizon
  • Stable training pipelines running on a regular cadence (daily, weekly retraining)
  • Long-running production workloads where 1-year commit clearly fits the workload lifecycle
  • Mature ML programs with 12+ months of historical usage data to base commitment sizing on
  • 3-year commitment only on workloads with 18+ months of stable history and clear roadmap continuation

Avoid when

  • New workloads without stable usage history — commit too early and pay for capacity not used
  • Research / experimentation workloads with intermittent or burstable usage
  • Workloads with high peak-to-average ratio (>4×) — commit to base only, on-demand for spikes
  • Workloads expected to decline or migrate to a different platform within the SP term
  • Workloads on instance types being deprecated by AWS — newer families often deliver better economics

Commit to the stable base of your SageMaker workload, not to peak. The remaining usage on on-demand is more flexible and absorbs variability.

A 30-Day SageMaker SP Evaluation Plan

Week 1 — Measure baseline. Pull SageMaker compute spend from Cost Explorer with hourly granularity over the last 90 days. Calculate the average steady-state hourly rate; identify peak-to-average ratio.

Week 2 — Model the savings. For 1-year and 3-year terms with each payment option, calculate the projected savings at 80% commitment of baseline. Compare against the workload’s planned lifecycle (will this workload still run in 12/36 months?).

Week 3 — Stack with existing SPs. Audit existing Compute Savings Plans (which do NOT cover SageMaker). Plan the SageMaker AI SP as an additive purchase; verify the combined coverage scope makes sense.

Week 4 — Purchase and monitor. Purchase the chosen SP. Monitor SP utilization in the Savings Plans console for the first 30 days. Adjust if actual usage diverges from the projected baseline.

What This Post Doesn’t Cover

If You Only Do One Thing This Week

Audit your SageMaker spend in Cost Explorer. If it is meaningful (above $5K/month) and currently on on-demand, the SageMaker AI Savings Plan break-even is almost always in your favor — typical first-year savings are 30–45% on the steady-state base. Start with a conservative 1-year No-Upfront commitment at 80% of baseline; re-evaluate after 90 days. The SageMaker-specific SP is one of the easiest cost-optimization wins on AWS when SageMaker is a meaningful line — and one of the most commonly missed because teams assume Compute Savings Plans cover it.

For the broader commitment-purchasing strategy across EC2, Fargate, Lambda, RDS, and SageMaker, the Reserved Instances vs Savings Plans decision guide covers the full landscape.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Recommended Reading

Explore All Articles »
6 min

AWS CDK Cost Estimation: Shift FinOps Left Into Pull Requests

Most FinOps reviews happen weeks after infrastructure ships, when the bill arrives. CDK cost estimation flips that — synthesize the stack, walk the resource graph, hit the AWS Pricing API per resource, and post a monthly-cost diff on every pull request. The cost feedback loop drops from weeks to minutes; the failure modes (request volume, token usage, data transfer) are documented up front.

5 min

Amazon CloudFront Pricing: Regional Tiers, Per-Request Fees, and the Lambda@Edge Surprise

CloudFront bills $0.085/GB egress in North America tiered down to $0.020/GB at extreme volume, plus $0.0075–$0.0100 per 10K requests, plus origin egress. Regional price classes drop the bill 30–60% by skipping expensive geographies. Real-time logs at $0.01 per million entries surprise high-traffic sites. Lambda@Edge is dramatically more expensive than CloudFront Functions.