AWS Data Governance Operating Model (2026): Catalog vs Stewardship on SageMaker Catalog
Quick summary: On a multi-domain retailer (~4,200 Glue tables, 11 AWS accounts), publishing a stewardship RACI plus SageMaker Catalog subscriptions cut mean time-to-data-access from 19 days to 4 days — without replacing Lake Formation enforcement.
Key Takeaways
- Amazon SageMaker Catalog is built on Amazon DataZone, per AWS SageMaker Catalog FAQs — same governance capabilities, unified experience for data and ML assets
- February 11, 2026 Lake Formation cross-account sharing v5 simplified RAM-based grants (see our cross-account sharing guide)
- If you deployed DataZone in 2024 and stopped at “we bought a catalog,” you likely have a tooling layer without a stewardship layer
- It is not DataZone product mechanics, not LF-Tags implementation detail, not SageMaker Unified Studio migration, and not cloud OU guardrails
- Benchmark pattern (not a cited client) — Multi-domain retailer, ~4,200 Glue tables across 11 AWS accounts, standalone DataZone since 2024 with no named stewards

Table of Contents
Amazon SageMaker Catalog is built on Amazon DataZone, per AWS SageMaker Catalog FAQs — same governance capabilities, unified experience for data and ML assets. February 11, 2026 Lake Formation cross-account sharing v5 simplified RAM-based grants (see our cross-account sharing guide). If you deployed DataZone in 2024 and stopped at “we bought a catalog,” you likely have a tooling layer without a stewardship layer.
This post is the data governance operating model — catalog vs stewardship RACI, federated council cadence, and how enforcement stays in Lake Formation. It is not DataZone product mechanics, not LF-Tags implementation detail, not SageMaker Unified Studio migration, and not cloud OU guardrails.
Artifacts: stewardship RACI CSV, governance rollout checklist.
Benchmark pattern (not a cited client) — Multi-domain retailer, ~4,200 Glue tables across 11 AWS accounts, standalone DataZone since 2024 with no named stewards. After publishing RACI + SageMaker Catalog subscription workflow with 2-business-day SLA, mean time-to-data-access 19 days → 4 days over 60 days. Lake Formation LF-Tags unchanged — only people/process and catalog hygiene.
Two layers — do not conflate them
| Layer | Question it answers | AWS surface | Owner role |
|---|---|---|---|
| Technical catalog | What tables exist and where? | Glue Data Catalog, crawlers | Data custodian |
| Business catalog | What does this data mean and who may use it? | SageMaker Catalog (DataZone) | Data steward |
| Enforcement | What actually runs at query time? | Lake Formation, IAM | Data custodian + security |
| Classification | Where is sensitive data? | Macie, Security Lake | Security officer |
Opinionated take: Stewardship before catalog expansion. Teams that crawl 500 new tables/month without glossary owners create a discovery landfill. Fix LF-Tags and Macie on landing buckets first — then publish to SageMaker Catalog.
Federated RACI — minimum viable roles
Download and adapt stewardship-raci.csv.
| Role | One-line accountability |
|---|---|
| Data owner | Approves retention and business definition |
| Data steward | Curates glossary, approves subscriptions |
| Data custodian | Runs Glue, LF grants, platform uptime |
| ML engineer | Publishes models/features with lineage |
| Security officer | Macie rules, SoD evidence |
| FinOps lead | Chargeback tags on data platform spend |
Council cadence: monthly, 60 minutes, agenda fixed — (1) subscription SLA breaches, (2) orphan assets without owner tag, (3) Macie high-severity open >14 days.
Stage 1 — Technical foundation (custodian)
Glue + Lake Formation before business catalog publish.
# Context: Lake Formation admin in us-east-1; revoke default IAM catalog access (July 2026)
aws lakeformation put-data-lake-settings \
--data-lake-settings '{"CreateDatabaseDefaultPermissions":[],"CreateTableDefaultPermissions":[]}'- Register S3 locations per account; scope crawlers to owned prefixes only
- Draft LF-Tags:
sensitivity,domain,cost-center(max 5 tags — taxonomy sprawl kills adoption) - Weekly Macie classification on
s3://landing-*buckets
Stage 2 — SageMaker Catalog publish workflow
Per AWS SageMaker + DataZone integration:
- Create domain project per business domain (finance, product, marketing)
- Import glossary terms — each term requires owner + steward names (not DL aliases)
- Publish owned assets from Glue tables and SageMaker feature groups
- Enable subscription approval — steward must act within SLA
Owned assets stay in project inventory until explicitly published to the organization catalog. Do not auto-publish bronze dumps.
Stage 3 — Wire catalog approval to Lake Formation
What broke — Week 3 of catalog rollout. Marketing subscribed to
customer_360_silverin SageMaker Catalog; steward approved in 4 hours. Athena queries still returnedAccessDenied— LF-Tagdomain=marketinggrant lived in a Step Functions workflow that only fired on manual ServiceNow tickets, not catalog events. Detection: 23 failed queries in CloudWatch Insights. Fix: EventBridge rule on DataZone subscription-approved → LambdaGrantLFTagPermissions. Rollback: disable rule, revert to ticket queue while fixing IAM role trust.
# Context: boto3 >= 1.34, us-east-1 — illustrative LF grant after catalog approval event
import boto3
lf = boto3.client("lakeformation")
def grant_on_subscription(event, context):
principal = event["detail"]["subscriberPrincipalArn"]
database = event["detail"]["assetDatabase"]
table = event["detail"]["assetTable"]
lf.grant_permissions(
Principal={"DataLakePrincipalIdentifier": principal},
Resource={"Table": {"DatabaseName": database, "Name": table}},
Permissions=["SELECT"],
)Operating metrics — what good looks like
| Metric | Target (90 days) | Data source |
|---|---|---|
| Mean time to approve subscription | < 2 business days | Catalog audit API |
| % tables with owner tag | > 85% | Glue + Athena inventory |
| Glossary term coverage (critical domains) | > 70% | Steward self-report + spot audit |
| Orphan tables (no queries 90d) | Decreasing MoM | Athena query logs |
What to Do This Week
- Download governance-rollout-checklist.md and complete Stage 0 (charter + RACI names).
- Run Macie on top 5 landing buckets; export findings to stewards.
- Pick one domain (not five) for SageMaker Catalog pilot — publish < 50 curated assets.
- Add EventBridge hook or ticket integration so catalog approval ≠ false positive access.
- Schedule first council with subscription SLA on the agenda.
Reproduce this — Open stewardship-raci.csv in a spreadsheet; add your domain names in column
typical_aws_surface. Walk governance-rollout-checklist.md stage by stage; check off items in your runbook tool.
What This Post Doesn’t Cover
- Full Lake Formation cross-account RAM topology — see secure cross-account sharing
- DataZone blueprint infrastructure provisioning — see enterprise DataZone guide
- EU AI Act model governance — see EU AI Act on AWS
- Security Lake OCSF normalization — see Amazon Security Lake
We have not benchmarked SageMaker Catalog semantic search accuracy against a manual glossary-only program — treat AI-generated metadata as draft until a steward approves.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.




