# SageMaker production deployment checklist (MLOps)

Run these **in order** before promoting a model to production traffic. Assumes **SageMaker AI**
real-time inference with **inference components** (recommended) and **capacity-aware instance
pools** (April 2026).

> Reflects **July 2026**: inference components for multi-model endpoints, capacity-aware
> fallback across instance types, Model Monitor for drift, Model Registry approval gates.

## Stage 0 — Model package (required)

- [ ] Model artifact versioned in S3 with immutable hash
- [ ] Model registered in **SageMaker Model Registry** with approval status = `PendingManualApproval`
- [ ] Training dataset lineage linked (pipeline execution ARN or commit SHA)
- [ ] Inference container image URI pinned (not `:latest`)

**Rollback trigger:** No registry entry → stop; you cannot audit what production serves.

## Stage 1 — Pre-production endpoint

- [ ] Deploy to **staging endpoint** with inference component (or single-model for dev only)
- [ ] Define **instance pool** priority list (e.g., `ml.g5.2xlarge` → `ml.g5.4xlarge` → `ml.g6.2xlarge`)
- [ ] Run load test at **1.5×** expected peak RPS for 30 minutes
- [ ] Capture p50/p99 latency and GPU utilization per instance type (CloudWatch)

**Rollback trigger:** p99 &gt; SLA at 1× peak → right-size instance or enable auto-scaling before prod.

## Stage 2 — Safety and monitoring

- [ ] **SageMaker Model Monitor** baseline statistics uploaded from staging traffic
- [ ] Data quality + model quality monitors scheduled (hourly for fraud, daily for batch-heavy)
- [ ] Clarify bias config if regulated use case (credit, hiring, health)
- [ ] CloudWatch alarms on `ModelLatency`, `Invocation4XXErrors`, `Invocation5XXErrors`

**Rollback trigger:** Monitor fails on staging shadow traffic → fix schema drift before prod.

## Stage 3 — Production promotion

- [ ] Approve model version in Model Registry
- [ ] Blue/green or canary via **production variant weights** or separate endpoint swap
- [ ] Run shadow mode (0% traffic) for 24h if high-risk domain
- [ ] Document rollback ARN (previous model version + endpoint config)

**Rollback trigger:** 5xx &gt; 0.1% or latency regression &gt; 20% in canary → revert weights.

## Stage 4 — Cost and operations

- [ ] Tag endpoint with `cost-center`, `model-name`, `environment`
- [ ] Set auto-scaling min/max; enable scale-to-zero only if cold-start SLA allows
- [ ] Schedule weekly review of idle GPU hours ([cost worksheet](./cost-latency-worksheet.csv))
- [ ] Wire pipeline retrain trigger (drift alarm → Pipeline execution)

## Related posts

- [SageMaker Unified Studio migration](/blog/amazon-sagemaker-unified-studio/)
- [Blue-green vs canary on AWS](/blog/aws-blue-green-vs-canary-deployment-decision-guide-2026/)
- [SageMaker training cost efficiency](/blog/how-to-run-sagemaker-training-jobs-cost-efficiently/)
- [SageMaker AI Savings Plans](/blog/aws-sagemaker-ai-savings-plans-commitment-flexibility/)
