Skip to main content

Cloud Data Platform

Snowflake on AWS

Snowflake on AWS with Iceberg interop, Cortex Analyst for natural-language BI, Hybrid Tables for OLTP, and Snowpark Container Services — governed by Horizon and Polaris.

Last updated:April 29, 2026Author:FactualMinds Cloud Integration TeamReviewed by:FactualMinds AWS-certified architects (Solutions Architect – Professional)

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Snowflake + AWS in 2026: Cortex Analyst, Iceberg Tables on S3, Hybrid Tables, Snowpark, Polaris Catalog — vs Redshift, Athena, SageMaker Lakehouse.

Key Facts

  • Snowflake + AWS in 2026: Cortex Analyst, Iceberg Tables on S3, Hybrid Tables, Snowpark, Polaris Catalog — vs Redshift, Athena, SageMaker Lakehouse
  • Snowflake on AWS with Iceberg interop, Cortex Analyst for natural-language BI, Hybrid Tables for OLTP, and Snowpark Container Services — governed by Horizon and Polaris
  • How does Snowflake run on AWS in 2026
  • Snowflake is a managed SaaS data platform that runs on AWS infrastructure
  • You pick an AWS region during account creation; Snowflake operates the compute (virtual warehouses on EC2), object storage (S3), and services (metadata, security, optimizer) for you

Entity Definitions

Bedrock
Bedrock is relevant to snowflake on aws.
SageMaker
SageMaker is relevant to snowflake on aws.
Lambda
Lambda is relevant to snowflake on aws.
EC2
EC2 is relevant to snowflake on aws.
S3
S3 is relevant to snowflake on aws.
RDS
RDS is relevant to snowflake on aws.
Aurora
Aurora is relevant to snowflake on aws.
Amazon Aurora
Amazon Aurora is relevant to snowflake on aws.
DynamoDB
DynamoDB is relevant to snowflake on aws.
CloudWatch
CloudWatch is relevant to snowflake on aws.
IAM
IAM is relevant to snowflake on aws.
Glue
Glue is relevant to snowflake on aws.
AWS Glue
AWS Glue is relevant to snowflake on aws.
Athena
Athena is relevant to snowflake on aws.
Amazon Athena
Amazon Athena is relevant to snowflake on aws.
Ask AI: ChatGPT Claude Perplexity Gemini

Snowflake on AWS

Snowflake is a SaaS data platform that runs on AWS infrastructure. It separates compute and storage, auto-scales, and has steadily expanded from a pure data warehouse into a broader developer platform — Snowpark for Python/Java/Scala in-database, Cortex AI for in-SQL LLM functions, Snowpark Container Services for long-lived containers near the data, Iceberg Tables for lakehouse interop, Hybrid Tables for OLTP-style workloads, and Horizon for governance.

In 2026 the question is less “Snowflake vs Redshift” and more “how do I compose Snowflake alongside my AWS-native analytics stack without duplicating data or governance”. The answer almost always involves Apache Iceberg as the interop layer.

What’s new for Snowflake on AWS in 2026

Why Snowflake on AWS

Architectural building blocks

Virtual warehouses

Storage options

Governance

Developer layer

Cortex AI in practice

-- NL-to-SQL via Cortex Analyst over a semantic model
SELECT SNOWFLAKE.CORTEX.ANALYST(
  'What was EMEA revenue last quarter by product line?',
  semantic_model => 'sales_analytics'
);

-- Hybrid search (BM25 + vector) over support tickets
SELECT ticket_id, snippet
FROM TABLE(SNOWFLAKE.CORTEX.SEARCH(
  service_name => 'support_search',
  query => 'checkout failing with 402 on EU cards'
));

-- Embed + summarize + classify inline
SELECT order_id,
       SNOWFLAKE.CORTEX.SUMMARIZE(notes)     AS summary,
       SNOWFLAKE.CORTEX.CLASSIFY_TEXT(notes, ['refund', 'retention', 'new_sale']) AS intent
FROM sales_notes;

Models: Snowflake Arctic, Llama 3 / 4, Mistral Large 2, Claude Sonnet 4 (where region-available), and others. Per-token billing — monitor monthly.

Iceberg Tables: the 2026 default

Iceberg Tables put the data in your S3 / S3 Tables bucket, with Snowflake as one compute engine among several.

Reserve Snowflake-native FDN tables for:

Loading data from AWS

When Snowflake is NOT the right call

Snowflake vs Redshift vs Athena vs SageMaker Lakehouse

DimensionSnowflakeRedshift ServerlessAthena + IcebergSageMaker Lakehouse
Operating modelSaaSAWS-managedServerless, pay-per-queryUnified lakehouse
StorageManaged or Iceberg on S3Managed or Spectrum on S3S3 / S3 TablesS3 + governance via DataZone
ConcurrencyStrong (multi-cluster)StrongModerateVaries by engine
IAM integrationExternal; Horizon + IdC SAMLNative IAMNative IAMNative IAM
NL-to-SQLCortex AnalystAmazon Q in QuickSightAmazon Q via AthenaAmazon Q + DataZone
In-DB MLSnowpark MLRedshift ML + BedrockSageMaker externalSageMaker native
Data sharingSnowflake MarketplaceCross-account, Redshift Data SharingS3 + Lake Formation grantsDataZone projects
Multi-cloudYesNoNoNo
Typical cost profilePremium; elasticModerate; predictablePay-per-TB scannedWorkload-dependent

Failure modes & resilience

1. Warehouse cold-start latency. Auto-suspend at 1 min is great for cost but adds 1–10 s on the first query against a cold warehouse. For latency-sensitive dashboards, set auto-suspend to 60 min on the BI warehouse, or use a “keep-warm” scheduled task that runs SELECT 1 every 30 s on the dashboard warehouse during business hours.

2. Multi-cluster scaling lag. When concurrency exceeds a single cluster, Snowflake adds clusters — but provisioning takes 10–30 s. During a flash-load (campaign launch, scheduled job convoy), early queries queue. Mitigation: set MIN_CLUSTER_COUNT = 2 for predictable peak warehouses; use SCALING_POLICY = STANDARD for gradual, ECONOMY for cost-sensitive.

3. Iceberg metadata stale-read across engines. When Athena, Glue, or EMR writes to an Iceberg table also read by Snowflake, the catalog must be refreshed. Snowflake Iceberg tables managed by Glue Data Catalog auto-refresh at query time; externally-cataloged tables require explicit ALTER ICEBERG TABLE … REFRESH. Symptom: missing rows that exist on S3. Mitigation: prefer single-writer-per-table or use the Polaris REST catalog for multi-engine writes.

4. Cortex token quota cliffs. Per-account daily token budgets exist for Cortex Analyst, Search, and COMPLETE; hitting them returns function evaluation failed. Track via SNOWFLAKE.CORTEX_FUNCTIONS_USAGE_HISTORY. Mitigation: rate-limit at the application tier; set Resource Monitors on warehouses running Cortex queries.

5. Time Travel + Fail-safe boundaries. Time Travel (default 1 day, up to 90 on Enterprise) covers operational restore. Fail-safe (7 days, automatic) is Snowflake-only recovery — you cannot self-serve from it. For accidental DROP TABLE recovery, use UNDROP TABLE within Time Travel; beyond Time Travel, open a Snowflake support ticket within 7 days.

6. Iceberg storage on S3 gets expensive without compaction. Frequent small writes create thousands of tiny files. S3 Tables auto-compacts; self-managed S3 buckets need scheduled OPTIMIZE runs (Spark/Glue). Symptom: queries scan many small files; per-query cost climbs.

7. Long-running query timeout. Default STATEMENT_TIMEOUT_IN_SECONDS = 172800 (48h). For interactive workloads, lower to 600 s on the BI warehouse so runaway queries don’t accumulate credits. Combine with STATEMENT_QUEUED_TIMEOUT_IN_SECONDS to fail fast under contention.

Observability runbook

ACCOUNT_USAGE views to monitor:

ViewUse for
WAREHOUSE_METERING_HISTORYDaily credit spend per warehouse; trending overruns
QUERY_HISTORYSlowest queries, queue time, partitions scanned
QUERY_ACCELERATION_HISTORYWhether QAS is helping the workloads it was enabled for
LOGIN_HISTORYAuth anomalies; pair with CloudTrail for cross-evidence
ACCESS_HISTORYLineage; which roles read which columns (governance)
CORTEX_FUNCTIONS_USAGE_HISTORYPer-function token spend; budget overruns

Resource monitors with hard stops:

CREATE OR REPLACE RESOURCE MONITOR rm_prod_bi
  WITH CREDIT_QUOTA = 4000
       FREQUENCY = MONTHLY
       START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 50  PERCENT DO NOTIFY
    ON 80  PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND
    ON 110 PERCENT DO SUSPEND_IMMEDIATE;

ALTER WAREHOUSE bi_prod SET RESOURCE_MONITOR = rm_prod_bi;

SUSPEND_IMMEDIATE kills running queries; reserve for hard caps. Pair with a CloudWatch metric alarm fed by a scheduled Lambda querying WAREHOUSE_METERING_HISTORY.

Debug path: “dashboard suddenly slow”:

  1. SHOW WAREHOUSES → confirm warehouse is STARTED. If SUSPENDED and a query is queued, that’s normal cold-start.
  2. QUERY_HISTORY for the dashboard’s user/role over the last hour: filter on EXECUTION_TIME > p95 baseline.
  3. Check QUEUED_OVERLOAD_TIME and QUEUED_PROVISIONING_TIME — high values indicate concurrency or scaling lag; bump MIN_CLUSTER_COUNT or warehouse size.
  4. If a specific query slowed: EXPLAIN USING TEXT and inspect Bytes scanned / Partitions scanned. Add or improve a clustering key on the most-filtered columns.
  5. Iceberg-only: confirm catalog refresh ran; cross-engine writers can leave the metadata stale.

Clustering-key validation:

-- Before declaring a clustering key, validate selectivity
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date, region)');
-- Look at average_overlaps and average_depth; aim for low values

Pricing notes (always verify)

Best practices

Cost

Performance

Governance

Reliability

Iceberg
Native read/write on S3 + S3 Tables; no Snowflake lock-in for the data
Cortex
In-SQL LLM functions: Analyst, Search, COMPLETE, EMBED_TEXT
24 hrs
Result-cache window for free repeat queries

Tools & Calculators

Self-serve calculators and assessments that pair with this integration.

AWS Data Analytics

Architect Snowflake alongside your S3 lakehouse, Redshift, Athena, and SageMaker Lakehouse.

Related AWS Services

Consulting engagements that frequently pair with this integration.

AWS Data Analytics Services — Glue, Athena & QuickSight

AWS data analytics services — scalable data warehouse, ETL/ELT pipelines, real-time analytics, and business intelligence.

Generative AI on AWS — Production-Ready LLM Apps in Weeks

Generative AI on AWS — Amazon Bedrock, SageMaker, RAG pipelines, agents, and LLM application development.

AWS Well-Architected Review — Free Assessment

Free AWS Well-Architected Review from FactualMinds. Identify risks, compliance gaps, and optimization opportunities.

Who typically runs this integration?

The roles that most often own or review this stack.

AWS Solutions for CTOs

Cloud strategy, multi-account governance, agentic AI platform decisions, and FinOps culture for technology leaders scaling AWS in 2026 and beyond.

AWS Solutions for FinOps Teams

FinOps Framework 2025 rollout, AI unit economics, CUR 2.0 with Split Cost Allocation, and Bedrock cost controls for cloud finance leaders on AWS.

Related Integrations

Other AWS integration guides commonly deployed alongside this one.

MongoDB with AWS

MongoDB Atlas on AWS in 2026: MongoDB 8.0, Vector Search GA, Stream Processing, Queryable Encryption, Edge Server — vs DynamoDB, OpenSearch, pgvector.

Datadog with AWS

Datadog on AWS in 2026: unified observability for CloudWatch, EKS, Lambda, Bedrock LLM workloads, and security posture across multi-cloud estates.

Frequently Asked Questions

How does Snowflake run on AWS in 2026?
Snowflake is a managed SaaS data platform that runs on AWS infrastructure. You pick an AWS region during account creation; Snowflake operates the compute (virtual warehouses on EC2), object storage (S3), and services (metadata, security, optimizer) for you. In 2026 the noteworthy architectural shifts are: native Iceberg Tables so your actual data sits in your S3 / S3 Tables bucket (not inside Snowflake-managed storage), Snowpark Container Services for running arbitrary long-lived containers near the data, and Hybrid Tables for row-level transactional workloads on the same platform as analytical queries.
When should I pick Snowflake vs Redshift vs Athena vs SageMaker Lakehouse on AWS?
Four rules of thumb. (1) **All-AWS, tight IAM integration, Redshift ML in mind, predictable workloads** — Redshift Serverless or Redshift provisioned is usually the cleaner choice and keeps data under native AWS governance. (2) **Pay-per-query on data already in S3, modest concurrency, cost-sensitive** — Amazon Athena + Iceberg on S3 Tables. (3) **Multi-cloud consumption, data sharing with external partners, many teams with wildly different SLAs, or an existing Snowflake estate** — Snowflake on AWS is usually worth the premium. (4) **You want a unified governance surface across lakehouse + ML + GenAI** — SageMaker Lakehouse + Amazon DataZone + Athena/Redshift is the 2026 AWS-native answer. In practice many enterprises run Snowflake AND Redshift/Athena side by side, with Iceberg as the interop layer.
What is Cortex Analyst and how is it different from Cortex Search or Amazon Q?
Cortex Analyst (GA 2024) turns natural-language questions into safe, governed SQL over a semantic model you define — think "what was revenue in EMEA last quarter broken down by product line?" returning an answer and the SQL that produced it. Cortex Search (GA 2024) is a managed RAG/hybrid-search service over unstructured text columns. Cortex COMPLETE / EMBED_TEXT / CLASSIFY_TEXT / SUMMARIZE are per-token in-SQL LLM functions. Amazon Q in QuickSight / Amazon Q for Business is the AWS-native equivalent for NL-to-BI questions, grounded in QuickSight datasets and enterprise connectors. Both approaches converge: keep the answer engine close to the data, govern inputs and outputs, and never paste raw data into an external prompt.
What are Iceberg Tables in Snowflake and why do they matter?
Iceberg Tables let Snowflake read and write Apache Iceberg datasets that live in your S3 bucket (or S3 Tables bucket) — you own the files, the schema, and the metadata. Benefits: (a) no data lock-in — Amazon Athena, Glue, EMR, Redshift, SageMaker, Flink, Spark, Trino, and Presto can read the same tables; (b) one governance boundary via AWS Lake Formation and/or Snowflake Horizon; (c) cheaper storage — S3/S3 Tables pricing instead of Snowflake-managed storage. Trade-offs: query latency on very hot workloads is slightly higher than Snowflake-native FDN tables, and some advanced Snowflake features (hybrid tables, search optimization) are only available on internal tables. The 2026 default we recommend for new analytics workloads is Iceberg first, Snowflake-native only where you need a specific Snowflake-only feature.
What is the Polaris Catalog and how does it relate to AWS Glue Data Catalog?
Polaris Catalog (open-sourced by Snowflake in 2024 and now under the Apache Iceberg ecosystem) is an Iceberg REST Catalog that any Iceberg-compatible engine can read — Snowflake, Trino, Presto, Flink, Spark, Athena. It lets multiple engines agree on table metadata without forcing everyone through a Snowflake-specific API. AWS Glue Data Catalog already speaks Iceberg; in most AWS-centric shops the pragmatic 2026 pattern is Glue Data Catalog for AWS-native engines (Athena, EMR, Redshift) plus Snowflake via Iceberg REST / Polaris federation. Avoid running two catalogs with divergent metadata for the same tables — pick one authoritative catalog per dataset.
What are Snowpark Container Services and Hybrid Tables?
**Snowpark Container Services** (GA 2024) lets you run long-lived containerized workloads (APIs, Streamlit apps, custom ML inference, third-party software) inside Snowflake, close to the data, with Snowflake handling scheduling and networking. Useful when data gravity makes moving data to an external runtime expensive or compliance-awkward. **Hybrid Tables** (GA 2024) combine row-level transactional access (OLTP-style) with analytical access in the same table — useful for unified metric layers, sharing state between an app and analytics, or small OLTP needs you would rather not run a separate RDS for. Compare with Amazon Aurora DSQL for strongly-consistent distributed SQL — different tool for a similar gap.
What is Snowflake Horizon and how does it intersect with AWS governance?
Snowflake Horizon is Snowflake's governance and discovery surface: data classification (PII/PCI), access policies (row-level, column-level, dynamic data masking), audit, lineage, and quality. On AWS, layer it with Amazon DataZone for multi-domain federated governance, AWS Lake Formation for Iceberg-table permissions on S3/S3 Tables, and CloudTrail + Security Lake (OCSF) for unified evidence. The 2026 pattern is Horizon inside Snowflake + Lake Formation on the S3 side + DataZone as the cross-tool catalog and marketplace.
What does Snowflake cost on AWS in 2026?
Ballparks — always verify at snowflake.com/pricing. Compute: ~$2-$4 per credit (Standard to Business Critical); 1 credit ≈ 1 XS node running for an hour. Storage (Snowflake-managed): ~$23-$40/TB/month — or pay S3 prices if you use Iceberg Tables on your bucket. Cortex AI functions: per-token. Cortex Analyst / Search: metered separately. Biggest 2026 optimization levers: (a) put cold and semi-cold data on Iceberg Tables (S3/S3 Tables) instead of Snowflake-managed; (b) aggressive auto-suspend (1-5 min for interactive warehouses); (c) multi-cluster warehouses for concurrency rather than one always-on XL; (d) Resource Monitors with budget alarms; (e) use Streams + Tasks for incremental, not full-table, processing; (f) review Cortex AI token usage monthly — it is easy to accidentally run SUMMARIZE over an entire table.

Related Reading

Need Help with This Integration?

Our AWS-certified engineers can design, implement, and operate this integration end-to-end — or review what you already have.