Cloud Data Platform

Snowflake on AWS

Snowflake on AWS with Iceberg interop, Cortex Analyst for natural-language BI, Hybrid Tables for OLTP, and Snowpark Container Services — governed by Horizon and Polaris.

Last updated: April 29, 2026Data PlatformsAuthor: FactualMinds Cloud Integration TeamReviewed by: FactualMinds AWS-certified architects (Solutions Architect – Professional)

Ask AI:ChatGPT Claude Perplexity Gemini

Snowflake on AWS

Snowflake is a SaaS data platform that runs on AWS infrastructure. It separates compute and storage, auto-scales, and has steadily expanded from a pure data warehouse into a broader developer platform — Snowpark for Python/Java/Scala in-database, Cortex AI for in-SQL LLM functions, Snowpark Container Services for long-lived containers near the data, Iceberg Tables for lakehouse interop, Hybrid Tables for OLTP-style workloads, and Horizon for governance.

In 2026 the question is less “Snowflake vs Redshift” and more “how do I compose Snowflake alongside my AWS-native analytics stack without duplicating data or governance”. The answer almost always involves Apache Iceberg as the interop layer.

What’s new for Snowflake on AWS in 2026

Iceberg Tables GA — native read/write of Iceberg tables sitting on your S3 / S3 Tables bucket. Your data, your catalog.
Cortex Analyst — natural-language-to-SQL over a governed semantic model.
Cortex Search — managed hybrid search (BM25 + vector) over unstructured columns.
Hybrid Tables — row-level transactional access alongside analytical queries.
Snowpark Container Services — run long-lived containers inside Snowflake, close to the data.
Polaris Catalog — open-source Iceberg REST catalog; interop with Athena, Trino, Flink, Spark.
Horizon governance — classification, access policies, lineage, quality, and audit in one console.
Document AI maturity — ML-powered extraction from PDFs and images stored in Snowflake stages (S3).
Prompt caching for Cortex and model-routing between Snowflake Arctic and third-party models via Cortex COMPLETE.

Why Snowflake on AWS

Operational simplicity — no infrastructure, automatic scaling, minimal tuning.
Compute / storage separation — add data without adding compute; spin warehouses up/down per team.
Data sharing — live sharing across Snowflake accounts and the Snowflake Marketplace without ETL.
Multi-cloud consumption — same dataset accessible from Snowflake accounts in AWS, Azure, and GCP.
AI-native features — Cortex puts LLM capabilities and NL-to-SQL one SQL function call away from the data.

Architectural building blocks

Virtual warehouses

XS to 6XL, plus multi-cluster for concurrency.
Auto-suspend / auto-resume; set suspend to 1-5 min for interactive workloads.
Separate warehouses per team prevent contention; governance via Horizon.

Storage options

Snowflake-managed FDN tables — proprietary columnar format; best latency and full feature coverage.
Iceberg Tables — Apache Iceberg on your S3 / S3 Tables bucket; interop with Athena, EMR, Redshift, Glue, SageMaker. Preferred default for new analytics data in 2026.
External Tables — read-only over S3; useful for one-off or ad-hoc queries on lake data.

Governance

Snowflake Horizon for classification, access policies, lineage, quality, audit.
Amazon DataZone for cross-tool catalog and marketplace.
AWS Lake Formation for Iceberg permissions on S3.
Security Hub + Security Lake (OCSF) for aggregated security evidence.

Developer layer

Snowpark Python/Java/Scala dataframes; runs inside Snowflake, no data movement.
Snowpark ML for scikit-learn/XGBoost training on Snowflake data.
Streamlit in Snowflake for data apps without a separate hosting layer.
Snowpark Container Services for APIs, Streamlit, custom ML inference, or third-party software.

Cortex AI in practice

-- NL-to-SQL via Cortex Analyst over a semantic model
SELECT SNOWFLAKE.CORTEX.ANALYST(
  'What was EMEA revenue last quarter by product line?',
  semantic_model => 'sales_analytics'
);

-- Hybrid search (BM25 + vector) over support tickets
SELECT ticket_id, snippet
FROM TABLE(SNOWFLAKE.CORTEX.SEARCH(
  service_name => 'support_search',
  query => 'checkout failing with 402 on EU cards'
));

-- Embed + summarize + classify inline
SELECT order_id,
       SNOWFLAKE.CORTEX.SUMMARIZE(notes)     AS summary,
       SNOWFLAKE.CORTEX.CLASSIFY_TEXT(notes, ['refund', 'retention', 'new_sale']) AS intent
FROM sales_notes;

Models: Snowflake Arctic, Llama 3 / 4, Mistral Large 2, Claude Sonnet 4 (where region-available), and others. Per-token billing — monitor monthly.

Iceberg Tables: the 2026 default

Iceberg Tables put the data in your S3 / S3 Tables bucket, with Snowflake as one compute engine among several.

Shared with Amazon Athena for pay-per-query exploration.
Shared with AWS Glue / EMR / Spark for ETL and ML feature engineering.
Shared with Amazon Redshift via Spectrum or Lake Formation.
Shared with SageMaker Lakehouse for ML training and inference pipelines.
Catalog via AWS Glue Data Catalog (preferred for AWS-native engines) or Polaris (for multi-engine, cross-vendor).

Reserve Snowflake-native FDN tables for:

Hot, concurrency-heavy BI workloads with strict latency targets.
Features not yet available on Iceberg (Hybrid Tables, search optimization service, some materialized views).

Loading data from AWS

COPY INTO from S3 for one-time or batch loads.
Snowpipe + S3 event notifications for continuous ingestion.
AWS Glue / Fivetran / Airbyte for source-system ETL (RDS, DynamoDB, SaaS).
Snowflake Kafka Connector for Amazon MSK streaming.
Streams + Tasks for change-data-capture-style incremental processing inside Snowflake.

When Snowflake is NOT the right call

Small team, AWS-only stack, predictable analytics workload — Amazon Redshift Serverless or Athena + Iceberg usually wins on cost and governance simplicity.
You need strict AWS-native IAM on every row — Redshift + Lake Formation integrates more tightly than Snowflake”s external IAM.
Pure ML training workloads over lake data — SageMaker + S3 + Glue can skip the Snowflake bill entirely.
Compliance team requires data to never leave AWS-managed storage — reduce Snowflake surface by preferring Iceberg Tables on S3 (data stays in S3) or choose a fully AWS-native stack.

Snowflake vs Redshift vs Athena vs SageMaker Lakehouse

Dimension	Snowflake	Redshift Serverless	Athena + Iceberg	SageMaker Lakehouse
Operating model	SaaS	AWS-managed	Serverless, pay-per-query	Unified lakehouse
Storage	Managed or Iceberg on S3	Managed or Spectrum on S3	S3 / S3 Tables	S3 + governance via DataZone
Concurrency	Strong (multi-cluster)	Strong	Moderate	Varies by engine
IAM integration	External; Horizon + IdC SAML	Native IAM	Native IAM	Native IAM
NL-to-SQL	Cortex Analyst	Amazon Q in QuickSight	Amazon Q via Athena	Amazon Q + DataZone
In-DB ML	Snowpark ML	Redshift ML + Bedrock	SageMaker external	SageMaker native
Data sharing	Snowflake Marketplace	Cross-account, Redshift Data Sharing	S3 + Lake Formation grants	DataZone projects
Multi-cloud	Yes	No	No	No
Typical cost profile	Premium; elastic	Moderate; predictable	Pay-per-TB scanned	Workload-dependent

Failure modes & resilience

1. Warehouse cold-start latency. Auto-suspend at 1 min is great for cost but adds 1–10 s on the first query against a cold warehouse. For latency-sensitive dashboards, set auto-suspend to 60 min on the BI warehouse, or use a “keep-warm” scheduled task that runs SELECT 1 every 30 s on the dashboard warehouse during business hours.

2. Multi-cluster scaling lag. When concurrency exceeds a single cluster, Snowflake adds clusters — but provisioning takes 10–30 s. During a flash-load (campaign launch, scheduled job convoy), early queries queue. Mitigation: set MIN_CLUSTER_COUNT = 2 for predictable peak warehouses; use SCALING_POLICY = STANDARD for gradual, ECONOMY for cost-sensitive.

3. Iceberg metadata stale-read across engines. When Athena, Glue, or EMR writes to an Iceberg table also read by Snowflake, the catalog must be refreshed. Snowflake Iceberg tables managed by Glue Data Catalog auto-refresh at query time; externally-cataloged tables require explicit ALTER ICEBERG TABLE … REFRESH. Symptom: missing rows that exist on S3. Mitigation: prefer single-writer-per-table or use the Polaris REST catalog for multi-engine writes.

4. Cortex token quota cliffs. Per-account daily token budgets exist for Cortex Analyst, Search, and COMPLETE; hitting them returns function evaluation failed. Track via SNOWFLAKE.CORTEX_FUNCTIONS_USAGE_HISTORY. Mitigation: rate-limit at the application tier; set Resource Monitors on warehouses running Cortex queries.

5. Time Travel + Fail-safe boundaries. Time Travel (default 1 day, up to 90 on Enterprise) covers operational restore. Fail-safe (7 days, automatic) is Snowflake-only recovery — you cannot self-serve from it. For accidental DROP TABLE recovery, use UNDROP TABLE within Time Travel; beyond Time Travel, open a Snowflake support ticket within 7 days.

6. Iceberg storage on S3 gets expensive without compaction. Frequent small writes create thousands of tiny files. S3 Tables auto-compacts; self-managed S3 buckets need scheduled OPTIMIZE runs (Spark/Glue). Symptom: queries scan many small files; per-query cost climbs.

7. Long-running query timeout. Default STATEMENT_TIMEOUT_IN_SECONDS = 172800 (48h). For interactive workloads, lower to 600 s on the BI warehouse so runaway queries don’t accumulate credits. Combine with STATEMENT_QUEUED_TIMEOUT_IN_SECONDS to fail fast under contention.

Observability runbook

ACCOUNT_USAGE views to monitor:

View	Use for
`WAREHOUSE_METERING_HISTORY`	Daily credit spend per warehouse; trending overruns
`QUERY_HISTORY`	Slowest queries, queue time, partitions scanned
`QUERY_ACCELERATION_HISTORY`	Whether QAS is helping the workloads it was enabled for
`LOGIN_HISTORY`	Auth anomalies; pair with CloudTrail for cross-evidence
`ACCESS_HISTORY`	Lineage; which roles read which columns (governance)
`CORTEX_FUNCTIONS_USAGE_HISTORY`	Per-function token spend; budget overruns

Resource monitors with hard stops:

CREATE OR REPLACE RESOURCE MONITOR rm_prod_bi
  WITH CREDIT_QUOTA = 4000
       FREQUENCY = MONTHLY
       START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 50  PERCENT DO NOTIFY
    ON 80  PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND
    ON 110 PERCENT DO SUSPEND_IMMEDIATE;

ALTER WAREHOUSE bi_prod SET RESOURCE_MONITOR = rm_prod_bi;

SUSPEND_IMMEDIATE kills running queries; reserve for hard caps. Pair with a CloudWatch metric alarm fed by a scheduled Lambda querying WAREHOUSE_METERING_HISTORY.

Debug path: “dashboard suddenly slow”:

SHOW WAREHOUSES → confirm warehouse is STARTED. If SUSPENDED and a query is queued, that’s normal cold-start.
QUERY_HISTORY for the dashboard’s user/role over the last hour: filter on EXECUTION_TIME > p95 baseline.
Check QUEUED_OVERLOAD_TIME and QUEUED_PROVISIONING_TIME — high values indicate concurrency or scaling lag; bump MIN_CLUSTER_COUNT or warehouse size.
If a specific query slowed: EXPLAIN USING TEXT and inspect Bytes scanned / Partitions scanned. Add or improve a clustering key on the most-filtered columns.
Iceberg-only: confirm catalog refresh ran; cross-engine writers can leave the metadata stale.

Clustering-key validation:

-- Before declaring a clustering key, validate selectivity
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date, region)');
-- Look at average_overlaps and average_depth; aim for low values

Pricing notes (always verify)

Compute credits: ~$2-$4 each (Standard to Business Critical).
Storage on Snowflake-managed: ~$23-$40/TB/month.
Iceberg Tables: you pay S3 + S3 Tables management fee; Snowflake compute charges as normal.
Cortex functions: per-token.
Cortex Analyst / Search: metered separately.
Set Resource Monitors with hard caps per warehouse; alert on 50% / 80% of monthly credit budget.

Best practices

Cost

Aggressive auto-suspend (1-5 min).
Multi-cluster for concurrency, not always-on size-up.
Iceberg on S3 for cold and semi-cold data.
Streams + Tasks for incremental processing.
Quarterly review of Cortex token usage.

Performance

Clustering keys on frequent WHERE / JOIN columns.
Result cache (free for 24 hours) — design dashboards to benefit.
Search Optimization Service for selective point queries on internal tables.

Governance

Horizon classification tags + dynamic data masking for PII.
Lake Formation for Iceberg permissions on S3.
DataZone for cross-tool catalog and marketplace.
Audit logs to CloudWatch and Security Lake; retain per compliance regime.

Reliability

Time Travel: retain 7-30 days; use for accidental-delete recovery.
Fail-safe: 7 days; do not depend on it for operational recovery.
Replication for DR across regions or clouds where justified.

Iceberg

Native read/write on S3 + S3 Tables; no Snowflake lock-in for the data

Cortex

In-SQL LLM functions: Analyst, Search, COMPLETE, EMBED_TEXT

24 hrs

Result-cache window for free repeat queries

Tools & Calculators

Self-serve calculators and assessments that pair with this integration.

AWS Data Analytics

Architect Snowflake alongside your S3 lakehouse, Redshift, Athena, and SageMaker Lakehouse.

Open Tool

Related AWS Services

Consulting engagements that frequently pair with this integration.

AWS Data Analytics Services — Glue, Athena & QuickSight

AWS data analytics services — scalable data warehouse, ETL/ELT pipelines, real-time analytics, and business intelligence.

Explore Service

Generative AI on AWS — Production-Ready LLM Apps in Weeks

Generative AI strategy and delivery on AWS — use-case selection, Bedrock + SageMaker architecture, governance, evaluations, and production rollout across the AWS AI stack.

Explore Service

AWS Well-Architected Review — Free Assessment

Free AWS Well-Architected Review from FactualMinds. Identify risks, compliance gaps, and optimization opportunities.

Explore Service

Who typically runs this integration?

The roles that most often own or review this stack.

AWS Solutions for CTOs

Cloud strategy, multi-account governance, agentic AI platform decisions, and FinOps culture for technology leaders scaling AWS in 2026 and beyond.

Explore

AWS Solutions for FinOps Teams

FinOps Framework 2025 rollout, AI unit economics, CUR 2.0 with Split Cost Allocation, and Bedrock cost controls for cloud finance leaders on AWS.

Explore

Related Integrations

Other AWS integration guides commonly deployed alongside this one.

MongoDB with AWS

MongoDB Atlas on AWS in 2026: MongoDB 8.0, Vector Search GA, Stream Processing, Queryable Encryption, Edge Server — vs DynamoDB, OpenSearch, pgvector.

View Guide

Datadog with AWS

Datadog on AWS in 2026: unified observability for CloudWatch, EKS, Lambda, Bedrock LLM workloads, and security posture across multi-cloud estates.

View Guide

Frequently Asked Questions

How does Snowflake run on AWS in 2026?

Snowflake is a managed SaaS data platform that runs on AWS infrastructure. You pick an AWS region during account creation; Snowflake operates the compute (virtual warehouses on EC2), object storage (S3), and services (metadata, security, optimizer) for you. In 2026 the noteworthy architectural shifts are: native Iceberg Tables so your actual data sits in your S3 / S3 Tables bucket (not inside Snowflake-managed storage), Snowpark Container Services for running arbitrary long-lived containers near the data, and Hybrid Tables for row-level transactional workloads on the same platform as analytical queries.

When should I pick Snowflake vs Redshift vs Athena vs SageMaker Lakehouse on AWS?

Four rules of thumb. (1) **All-AWS, tight IAM integration, Redshift ML in mind, predictable workloads** — Redshift Serverless or Redshift provisioned is usually the cleaner choice and keeps data under native AWS governance. (2) **Pay-per-query on data already in S3, modest concurrency, cost-sensitive** — Amazon Athena + Iceberg on S3 Tables. (3) **Multi-cloud consumption, data sharing with external partners, many teams with wildly different SLAs, or an existing Snowflake estate** — Snowflake on AWS is usually worth the premium. (4) **You want a unified governance surface across lakehouse + ML + GenAI** — SageMaker Lakehouse + Amazon DataZone + Athena/Redshift is the 2026 AWS-native answer. In practice many enterprises run Snowflake AND Redshift/Athena side by side, with Iceberg as the interop layer.

What is Cortex Analyst and how is it different from Cortex Search or Amazon Q?

Cortex Analyst (GA 2024) turns natural-language questions into safe, governed SQL over a semantic model you define — think "what was revenue in EMEA last quarter broken down by product line?" returning an answer and the SQL that produced it. Cortex Search (GA 2024) is a managed RAG/hybrid-search service over unstructured text columns. Cortex COMPLETE / EMBED_TEXT / CLASSIFY_TEXT / SUMMARIZE are per-token in-SQL LLM functions. Amazon Q in QuickSight / Amazon Q for Business is the AWS-native equivalent for NL-to-BI questions, grounded in QuickSight datasets and enterprise connectors. Both approaches converge: keep the answer engine close to the data, govern inputs and outputs, and never paste raw data into an external prompt.

What are Iceberg Tables in Snowflake and why do they matter?

Iceberg Tables let Snowflake read and write Apache Iceberg datasets that live in your S3 bucket (or S3 Tables bucket) — you own the files, the schema, and the metadata. Benefits: (a) no data lock-in — Amazon Athena, Glue, EMR, Redshift, SageMaker, Flink, Spark, Trino, and Presto can read the same tables; (b) one governance boundary via AWS Lake Formation and/or Snowflake Horizon; (c) cheaper storage — S3/S3 Tables pricing instead of Snowflake-managed storage. Trade-offs: query latency on very hot workloads is slightly higher than Snowflake-native FDN tables, and some advanced Snowflake features (hybrid tables, search optimization) are only available on internal tables. The 2026 default we recommend for new analytics workloads is Iceberg first, Snowflake-native only where you need a specific Snowflake-only feature.

What is the Polaris Catalog and how does it relate to AWS Glue Data Catalog?

Polaris Catalog (open-sourced by Snowflake in 2024 and now under the Apache Iceberg ecosystem) is an Iceberg REST Catalog that any Iceberg-compatible engine can read — Snowflake, Trino, Presto, Flink, Spark, Athena. It lets multiple engines agree on table metadata without forcing everyone through a Snowflake-specific API. AWS Glue Data Catalog already speaks Iceberg; in most AWS-centric shops the pragmatic 2026 pattern is Glue Data Catalog for AWS-native engines (Athena, EMR, Redshift) plus Snowflake via Iceberg REST / Polaris federation. Avoid running two catalogs with divergent metadata for the same tables — pick one authoritative catalog per dataset.

What are Snowpark Container Services and Hybrid Tables?

**Snowpark Container Services** (GA 2024) lets you run long-lived containerized workloads (APIs, Streamlit apps, custom ML inference, third-party software) inside Snowflake, close to the data, with Snowflake handling scheduling and networking. Useful when data gravity makes moving data to an external runtime expensive or compliance-awkward. **Hybrid Tables** (GA 2024) combine row-level transactional access (OLTP-style) with analytical access in the same table — useful for unified metric layers, sharing state between an app and analytics, or small OLTP needs you would rather not run a separate RDS for. Compare with Amazon Aurora DSQL for strongly-consistent distributed SQL — different tool for a similar gap.

What is Snowflake Horizon and how does it intersect with AWS governance?

Snowflake Horizon is Snowflake's governance and discovery surface: data classification (PII/PCI), access policies (row-level, column-level, dynamic data masking), audit, lineage, and quality. On AWS, layer it with Amazon DataZone for multi-domain federated governance, AWS Lake Formation for Iceberg-table permissions on S3/S3 Tables, and CloudTrail + Security Lake (OCSF) for unified evidence. The 2026 pattern is Horizon inside Snowflake + Lake Formation on the S3 side + DataZone as the cross-tool catalog and marketplace.

What does Snowflake cost on AWS in 2026?

Ballparks — always verify at snowflake.com/pricing. Compute: ~$2-$4 per credit (Standard to Business Critical); 1 credit ≈ 1 XS node running for an hour. Storage (Snowflake-managed): ~$23-$40/TB/month — or pay S3 prices if you use Iceberg Tables on your bucket. Cortex AI functions: per-token. Cortex Analyst / Search: metered separately. Biggest 2026 optimization levers: (a) put cold and semi-cold data on Iceberg Tables (S3/S3 Tables) instead of Snowflake-managed; (b) aggressive auto-suspend (1-5 min for interactive warehouses); (c) multi-cluster warehouses for concurrency rather than one always-on XL; (d) Resource Monitors with budget alarms; (e) use Streams + Tasks for incremental, not full-table, processing; (f) review Cortex AI token usage monthly — it is easy to accidentally run SUMMARIZE over an entire table.

Need Help with This Integration?

Our AWS-certified engineers can design, implement, and operate this integration end-to-end — or review what you already have.

Talk to AWS Experts

AWS Data Analytics

Snowflake on AWS

Snowflake on AWS

What’s new for Snowflake on AWS in 2026

Why Snowflake on AWS

Architectural building blocks

Virtual warehouses

Storage options

Governance

Developer layer

Cortex AI in practice

Iceberg Tables: the 2026 default

Loading data from AWS

When Snowflake is NOT the right call

Snowflake vs Redshift vs Athena vs SageMaker Lakehouse

Failure modes & resilience

Observability runbook

Pricing notes (always verify)

Best practices

Tools & Calculators

AWS Data Analytics

Related AWS Services

AWS Data Analytics Services — Glue, Athena & QuickSight

Generative AI on AWS — Production-Ready LLM Apps in Weeks

AWS Well-Architected Review — Free Assessment

Who typically runs this integration?

AWS Solutions for CTOs

AWS Solutions for FinOps Teams

Related Integrations

MongoDB with AWS

Datadog with AWS

Frequently Asked Questions

Related Reading

AWS Glue 5: Modern ETL with Apache Iceberg — Tables, Time Travel, and Lakehouse Patterns

Amazon Athena Cost Optimization: Partition Pruning, Compression, and Iceberg Tables

Amazon Redshift Serverless vs Provisioned: Which Is Right for Your Workload?

Need Help with This Integration?

Snowflake on AWS

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Snowflake on AWS

What’s new for Snowflake on AWS in 2026

Why Snowflake on AWS

Architectural building blocks

Virtual warehouses

Storage options

Governance

Developer layer

Cortex AI in practice

Iceberg Tables: the 2026 default

Loading data from AWS

When Snowflake is NOT the right call

Snowflake vs Redshift vs Athena vs SageMaker Lakehouse

Failure modes & resilience

Observability runbook

Pricing notes (always verify)

Best practices

Related reading

Related services

Tools & Calculators

AWS Data Analytics

Related AWS Services

AWS Data Analytics Services — Glue, Athena & QuickSight

Generative AI on AWS — Production-Ready LLM Apps in Weeks

AWS Well-Architected Review — Free Assessment

Who typically runs this integration?

AWS Solutions for CTOs

AWS Solutions for FinOps Teams

Related Integrations

MongoDB with AWS

Datadog with AWS

Frequently Asked Questions

Related Reading

AWS Glue 5: Modern ETL with Apache Iceberg — Tables, Time Travel, and Lakehouse Patterns

Amazon Athena Cost Optimization: Partition Pruning, Compression, and Iceberg Tables

Amazon Redshift Serverless vs Provisioned: Which Is Right for Your Workload?

Need Help with This Integration?