Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

On a multi-domain retailer (~4,200 Glue tables, 11 AWS accounts), publishing a stewardship RACI plus SageMaker Catalog subscriptions cut mean time-to-data-access from 19 days to 4 days — without replacing Lake Formation enforcement.

Key Facts

  • Amazon SageMaker Catalog is built on Amazon DataZone, per AWS SageMaker Catalog FAQs — same governance capabilities, unified experience for data and ML assets
  • February 11, 2026 Lake Formation cross-account sharing v5 simplified RAM-based grants (see our cross-account sharing guide)
  • If you deployed DataZone in 2024 and stopped at “we bought a catalog,” you likely have a tooling layer without a stewardship layer
  • It is not DataZone product mechanics, not LF-Tags implementation detail, not SageMaker Unified Studio migration, and not cloud OU guardrails
  • Benchmark pattern (not a cited client) — Multi-domain retailer, ~4,200 Glue tables across 11 AWS accounts, standalone DataZone since 2024 with no named stewards

Entity Definitions

SageMaker
SageMaker is an AWS service discussed in this article.
Amazon SageMaker
Amazon SageMaker is an AWS service discussed in this article.
Lambda
Lambda is an AWS service discussed in this article.
S3
S3 is an AWS service discussed in this article.
CloudWatch
CloudWatch is an AWS service discussed in this article.
IAM
IAM is an AWS service discussed in this article.
Step Functions
Step Functions is an AWS service discussed in this article.
EventBridge
EventBridge is an AWS service discussed in this article.

AWS Data Governance Operating Model (2026): Catalog vs Stewardship on SageMaker Catalog

Data & AnalyticsPalaniappan P4 min read

Quick summary: On a multi-domain retailer (~4,200 Glue tables, 11 AWS accounts), publishing a stewardship RACI plus SageMaker Catalog subscriptions cut mean time-to-data-access from 19 days to 4 days — without replacing Lake Formation enforcement.

Key Takeaways

  • Amazon SageMaker Catalog is built on Amazon DataZone, per AWS SageMaker Catalog FAQs — same governance capabilities, unified experience for data and ML assets
  • February 11, 2026 Lake Formation cross-account sharing v5 simplified RAM-based grants (see our cross-account sharing guide)
  • If you deployed DataZone in 2024 and stopped at “we bought a catalog,” you likely have a tooling layer without a stewardship layer
  • It is not DataZone product mechanics, not LF-Tags implementation detail, not SageMaker Unified Studio migration, and not cloud OU guardrails
  • Benchmark pattern (not a cited client) — Multi-domain retailer, ~4,200 Glue tables across 11 AWS accounts, standalone DataZone since 2024 with no named stewards
AWS Data Governance Operating Model (2026): Catalog vs Stewardship on SageMaker Catalog
Table of Contents

Amazon SageMaker Catalog is built on Amazon DataZone, per AWS SageMaker Catalog FAQs — same governance capabilities, unified experience for data and ML assets. February 11, 2026 Lake Formation cross-account sharing v5 simplified RAM-based grants (see our cross-account sharing guide). If you deployed DataZone in 2024 and stopped at “we bought a catalog,” you likely have a tooling layer without a stewardship layer.

This post is the data governance operating model — catalog vs stewardship RACI, federated council cadence, and how enforcement stays in Lake Formation. It is not DataZone product mechanics, not LF-Tags implementation detail, not SageMaker Unified Studio migration, and not cloud OU guardrails.

Artifacts: stewardship RACI CSV, governance rollout checklist.

Benchmark pattern (not a cited client) — Multi-domain retailer, ~4,200 Glue tables across 11 AWS accounts, standalone DataZone since 2024 with no named stewards. After publishing RACI + SageMaker Catalog subscription workflow with 2-business-day SLA, mean time-to-data-access 19 days → 4 days over 60 days. Lake Formation LF-Tags unchanged — only people/process and catalog hygiene.

Two layers — do not conflate them

LayerQuestion it answersAWS surfaceOwner role
Technical catalogWhat tables exist and where?Glue Data Catalog, crawlersData custodian
Business catalogWhat does this data mean and who may use it?SageMaker Catalog (DataZone)Data steward
EnforcementWhat actually runs at query time?Lake Formation, IAMData custodian + security
ClassificationWhere is sensitive data?Macie, Security LakeSecurity officer

Opinionated take: Stewardship before catalog expansion. Teams that crawl 500 new tables/month without glossary owners create a discovery landfill. Fix LF-Tags and Macie on landing buckets first — then publish to SageMaker Catalog.

Federated RACI — minimum viable roles

Download and adapt stewardship-raci.csv.

RoleOne-line accountability
Data ownerApproves retention and business definition
Data stewardCurates glossary, approves subscriptions
Data custodianRuns Glue, LF grants, platform uptime
ML engineerPublishes models/features with lineage
Security officerMacie rules, SoD evidence
FinOps leadChargeback tags on data platform spend

Council cadence: monthly, 60 minutes, agenda fixed — (1) subscription SLA breaches, (2) orphan assets without owner tag, (3) Macie high-severity open >14 days.

Stage 1 — Technical foundation (custodian)

Glue + Lake Formation before business catalog publish.

# Context: Lake Formation admin in us-east-1; revoke default IAM catalog access (July 2026)
aws lakeformation put-data-lake-settings \
  --data-lake-settings '{"CreateDatabaseDefaultPermissions":[],"CreateTableDefaultPermissions":[]}'
  • Register S3 locations per account; scope crawlers to owned prefixes only
  • Draft LF-Tags: sensitivity, domain, cost-center (max 5 tags — taxonomy sprawl kills adoption)
  • Weekly Macie classification on s3://landing-* buckets

Stage 2 — SageMaker Catalog publish workflow

Per AWS SageMaker + DataZone integration:

  1. Create domain project per business domain (finance, product, marketing)
  2. Import glossary terms — each term requires owner + steward names (not DL aliases)
  3. Publish owned assets from Glue tables and SageMaker feature groups
  4. Enable subscription approval — steward must act within SLA

Owned assets stay in project inventory until explicitly published to the organization catalog. Do not auto-publish bronze dumps.

Stage 3 — Wire catalog approval to Lake Formation

What broke — Week 3 of catalog rollout. Marketing subscribed to customer_360_silver in SageMaker Catalog; steward approved in 4 hours. Athena queries still returned AccessDenied — LF-Tag domain=marketing grant lived in a Step Functions workflow that only fired on manual ServiceNow tickets, not catalog events. Detection: 23 failed queries in CloudWatch Insights. Fix: EventBridge rule on DataZone subscription-approved → Lambda GrantLFTagPermissions. Rollback: disable rule, revert to ticket queue while fixing IAM role trust.

# Context: boto3 >= 1.34, us-east-1 — illustrative LF grant after catalog approval event
import boto3

lf = boto3.client("lakeformation")

def grant_on_subscription(event, context):
    principal = event["detail"]["subscriberPrincipalArn"]
    database = event["detail"]["assetDatabase"]
    table = event["detail"]["assetTable"]
    lf.grant_permissions(
        Principal={"DataLakePrincipalIdentifier": principal},
        Resource={"Table": {"DatabaseName": database, "Name": table}},
        Permissions=["SELECT"],
    )

Operating metrics — what good looks like

MetricTarget (90 days)Data source
Mean time to approve subscription< 2 business daysCatalog audit API
% tables with owner tag> 85%Glue + Athena inventory
Glossary term coverage (critical domains)> 70%Steward self-report + spot audit
Orphan tables (no queries 90d)Decreasing MoMAthena query logs

What to Do This Week

  1. Download governance-rollout-checklist.md and complete Stage 0 (charter + RACI names).
  2. Run Macie on top 5 landing buckets; export findings to stewards.
  3. Pick one domain (not five) for SageMaker Catalog pilot — publish < 50 curated assets.
  4. Add EventBridge hook or ticket integration so catalog approval ≠ false positive access.
  5. Schedule first council with subscription SLA on the agenda.

Reproduce this — Open stewardship-raci.csv in a spreadsheet; add your domain names in column typical_aws_surface. Walk governance-rollout-checklist.md stage by stage; check off items in your runbook tool.

What This Post Doesn’t Cover

We have not benchmarked SageMaker Catalog semantic search accuracy against a manual glossary-only program — treat AI-generated metadata as draft until a steward approves.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Recommended Reading

Explore All Articles »
5 min

Secure Cross-Account Data Sharing on AWS (2026): Lake Formation, LF-Tags, and Data Mesh Without Copying the Lake

Copying curated Parquet into every consumer account is how data platforms drown in storage cost and permission sprawl. On Feb 11, 2026 AWS shipped Lake Formation cross-account version 5 — wildcard RAM shares for hundreds of thousands of tables. A composite 12-account platform cut duplicate curated copies from 3 to 0 and dropped cross-account access tickets from ~11/month to ~3 by standardizing LF-Tags + resource links.