---
title: Snowflake on AWS
description: Snowflake + AWS in 2026: Cortex Analyst, Iceberg Tables on S3, Hybrid Tables, Snowpark, Polaris Catalog — vs Redshift, Athena, SageMaker Lakehouse.
url: https://www.factualminds.com/integrations/snowflake-aws/
category: data
updated: 2026-04-29
---

# Snowflake on AWS

> Snowflake on AWS with Iceberg interop, Cortex Analyst for natural-language BI, Hybrid Tables for OLTP, and Snowpark Container Services — governed by Horizon and Polaris.

## Snowflake on AWS

Snowflake is a SaaS data platform that runs on AWS infrastructure. It separates compute and storage, auto-scales, and has steadily expanded from a pure data warehouse into a broader developer platform — Snowpark for Python/Java/Scala in-database, Cortex AI for in-SQL LLM functions, Snowpark Container Services for long-lived containers near the data, Iceberg Tables for lakehouse interop, Hybrid Tables for OLTP-style workloads, and Horizon for governance.

In 2026 the question is less "Snowflake vs Redshift" and more "how do I compose Snowflake alongside my AWS-native analytics stack without duplicating data or governance". The answer almost always involves Apache Iceberg as the interop layer.

## What's new for Snowflake on AWS in 2026

- **Iceberg Tables GA** — native read/write of Iceberg tables sitting on your S3 / S3 Tables bucket. Your data, your catalog.
- **Cortex Analyst** — natural-language-to-SQL over a governed semantic model.
- **Cortex Search** — managed hybrid search (BM25 + vector) over unstructured columns.
- **Hybrid Tables** — row-level transactional access alongside analytical queries.
- **Snowpark Container Services** — run long-lived containers inside Snowflake, close to the data.
- **Polaris Catalog** — open-source Iceberg REST catalog; interop with Athena, Trino, Flink, Spark.
- **Horizon governance** — classification, access policies, lineage, quality, and audit in one console.
- **Document AI maturity** — ML-powered extraction from PDFs and images stored in Snowflake stages (S3).
- **Prompt caching for Cortex** and model-routing between Snowflake Arctic and third-party models via Cortex COMPLETE.

## Why Snowflake on AWS

- **Operational simplicity** — no infrastructure, automatic scaling, minimal tuning.
- **Compute / storage separation** — add data without adding compute; spin warehouses up/down per team.
- **Data sharing** — live sharing across Snowflake accounts and the Snowflake Marketplace without ETL.
- **Multi-cloud consumption** — same dataset accessible from Snowflake accounts in AWS, Azure, and GCP.
- **AI-native features** — Cortex puts LLM capabilities and NL-to-SQL one SQL function call away from the data.

## Architectural building blocks

### Virtual warehouses

- XS to 6XL, plus multi-cluster for concurrency.
- Auto-suspend / auto-resume; set suspend to 1-5 min for interactive workloads.
- Separate warehouses per team prevent contention; governance via Horizon.

### Storage options

- **Snowflake-managed FDN tables** — proprietary columnar format; best latency and full feature coverage.
- **Iceberg Tables** — Apache Iceberg on your S3 / S3 Tables bucket; interop with Athena, EMR, Redshift, Glue, SageMaker. Preferred default for new analytics data in 2026.
- **External Tables** — read-only over S3; useful for one-off or ad-hoc queries on lake data.

### Governance

- **Snowflake Horizon** for classification, access policies, lineage, quality, audit.
- **Amazon DataZone** for cross-tool catalog and marketplace.
- **AWS Lake Formation** for Iceberg permissions on S3.
- **Security Hub + Security Lake (OCSF)** for aggregated security evidence.

### Developer layer

- **Snowpark** Python/Java/Scala dataframes; runs inside Snowflake, no data movement.
- **Snowpark ML** for scikit-learn/XGBoost training on Snowflake data.
- **Streamlit in Snowflake** for data apps without a separate hosting layer.
- **Snowpark Container Services** for APIs, Streamlit, custom ML inference, or third-party software.

## Cortex AI in practice

```sql
-- NL-to-SQL via Cortex Analyst over a semantic model
SELECT SNOWFLAKE.CORTEX.ANALYST(
  'What was EMEA revenue last quarter by product line?',
  semantic_model => 'sales_analytics'
);

-- Hybrid search (BM25 + vector) over support tickets
SELECT ticket_id, snippet
FROM TABLE(SNOWFLAKE.CORTEX.SEARCH(
  service_name => 'support_search',
  query => 'checkout failing with 402 on EU cards'
));

-- Embed + summarize + classify inline
SELECT order_id,
       SNOWFLAKE.CORTEX.SUMMARIZE(notes)     AS summary,
       SNOWFLAKE.CORTEX.CLASSIFY_TEXT(notes, ['refund', 'retention', 'new_sale']) AS intent
FROM sales_notes;
```

Models: **Snowflake Arctic**, **Llama 3 / 4**, **Mistral Large 2**, **Claude Sonnet 4** (where region-available), and others. Per-token billing — monitor monthly.

## Iceberg Tables: the 2026 default

Iceberg Tables put the data in your S3 / S3 Tables bucket, with Snowflake as one compute engine among several.

- Shared with **Amazon Athena** for pay-per-query exploration.
- Shared with **AWS Glue / EMR / Spark** for ETL and ML feature engineering.
- Shared with **Amazon Redshift** via Spectrum or Lake Formation.
- Shared with **SageMaker Lakehouse** for ML training and inference pipelines.
- Catalog via **AWS Glue Data Catalog** (preferred for AWS-native engines) or **Polaris** (for multi-engine, cross-vendor).

Reserve Snowflake-native FDN tables for:

- Hot, concurrency-heavy BI workloads with strict latency targets.
- Features not yet available on Iceberg (Hybrid Tables, search optimization service, some materialized views).

## Loading data from AWS

- **COPY INTO from S3** for one-time or batch loads.
- **Snowpipe + S3 event notifications** for continuous ingestion.
- **AWS Glue / Fivetran / Airbyte** for source-system ETL (RDS, DynamoDB, SaaS).
- **Snowflake Kafka Connector** for Amazon MSK streaming.
- **Streams + Tasks** for change-data-capture-style incremental processing inside Snowflake.

## When Snowflake is NOT the right call

- Small team, AWS-only stack, predictable analytics workload — **Amazon Redshift Serverless** or **Athena + Iceberg** usually wins on cost and governance simplicity.
- You need strict AWS-native IAM on every row — Redshift + Lake Formation integrates more tightly than Snowflake''s external IAM.
- Pure ML training workloads over lake data — **SageMaker + S3 + Glue** can skip the Snowflake bill entirely.
- Compliance team requires data to never leave AWS-managed storage — reduce Snowflake surface by preferring Iceberg Tables on S3 (data stays in S3) or choose a fully AWS-native stack.

## Snowflake vs Redshift vs Athena vs SageMaker Lakehouse

| Dimension            | Snowflake                    | Redshift Serverless                  | Athena + Iceberg           | SageMaker Lakehouse          |
| -------------------- | ---------------------------- | ------------------------------------ | -------------------------- | ---------------------------- |
| Operating model      | SaaS                         | AWS-managed                          | Serverless, pay-per-query  | Unified lakehouse            |
| Storage              | Managed or Iceberg on S3     | Managed or Spectrum on S3            | S3 / S3 Tables             | S3 + governance via DataZone |
| Concurrency          | Strong (multi-cluster)       | Strong                               | Moderate                   | Varies by engine             |
| IAM integration      | External; Horizon + IdC SAML | Native IAM                           | Native IAM                 | Native IAM                   |
| NL-to-SQL            | Cortex Analyst               | Amazon Q in QuickSight               | Amazon Q via Athena        | Amazon Q + DataZone          |
| In-DB ML             | Snowpark ML                  | Redshift ML + Bedrock                | SageMaker external         | SageMaker native             |
| Data sharing         | Snowflake Marketplace        | Cross-account, Redshift Data Sharing | S3 + Lake Formation grants | DataZone projects            |
| Multi-cloud          | Yes                          | No                                   | No                         | No                           |
| Typical cost profile | Premium; elastic             | Moderate; predictable                | Pay-per-TB scanned         | Workload-dependent           |

## Failure modes & resilience

**1. Warehouse cold-start latency.** Auto-suspend at 1 min is great for cost but adds 1–10 s on the first query against a cold warehouse. For latency-sensitive dashboards, set auto-suspend to 60 min on the BI warehouse, or use a "keep-warm" scheduled task that runs `SELECT 1` every 30 s on the dashboard warehouse during business hours.

**2. Multi-cluster scaling lag.** When concurrency exceeds a single cluster, Snowflake adds clusters — but provisioning takes 10–30 s. During a flash-load (campaign launch, scheduled job convoy), early queries queue. Mitigation: set `MIN_CLUSTER_COUNT = 2` for predictable peak warehouses; use `SCALING_POLICY = STANDARD` for gradual, `ECONOMY` for cost-sensitive.

**3. Iceberg metadata stale-read across engines.** When Athena, Glue, or EMR writes to an Iceberg table also read by Snowflake, the catalog must be refreshed. Snowflake Iceberg tables managed by Glue Data Catalog auto-refresh at query time; externally-cataloged tables require explicit `ALTER ICEBERG TABLE … REFRESH`. Symptom: missing rows that exist on S3. Mitigation: prefer single-writer-per-table or use the Polaris REST catalog for multi-engine writes.

**4. Cortex token quota cliffs.** Per-account daily token budgets exist for Cortex Analyst, Search, and COMPLETE; hitting them returns `function evaluation failed`. Track via `SNOWFLAKE.CORTEX_FUNCTIONS_USAGE_HISTORY`. Mitigation: rate-limit at the application tier; set Resource Monitors on warehouses running Cortex queries.

**5. Time Travel + Fail-safe boundaries.** Time Travel (default 1 day, up to 90 on Enterprise) covers operational restore. Fail-safe (7 days, automatic) is Snowflake-only recovery — you cannot self-serve from it. For accidental DROP TABLE recovery, use `UNDROP TABLE` within Time Travel; beyond Time Travel, open a Snowflake support ticket within 7 days.

**6. Iceberg storage on S3 gets expensive without compaction.** Frequent small writes create thousands of tiny files. S3 Tables auto-compacts; self-managed S3 buckets need scheduled `OPTIMIZE` runs (Spark/Glue). Symptom: queries scan many small files; per-query cost climbs.

**7. Long-running query timeout.** Default `STATEMENT_TIMEOUT_IN_SECONDS = 172800` (48h). For interactive workloads, lower to 600 s on the BI warehouse so runaway queries don't accumulate credits. Combine with `STATEMENT_QUEUED_TIMEOUT_IN_SECONDS` to fail fast under contention.

## Observability runbook

**ACCOUNT_USAGE views to monitor:**

| View                             | Use for                                                 |
| -------------------------------- | ------------------------------------------------------- |
| `WAREHOUSE_METERING_HISTORY`     | Daily credit spend per warehouse; trending overruns     |
| `QUERY_HISTORY`                  | Slowest queries, queue time, partitions scanned         |
| `QUERY_ACCELERATION_HISTORY`     | Whether QAS is helping the workloads it was enabled for |
| `LOGIN_HISTORY`                  | Auth anomalies; pair with CloudTrail for cross-evidence |
| `ACCESS_HISTORY`                 | Lineage; which roles read which columns (governance)    |
| `CORTEX_FUNCTIONS_USAGE_HISTORY` | Per-function token spend; budget overruns               |

**Resource monitors with hard stops:**

```sql
CREATE OR REPLACE RESOURCE MONITOR rm_prod_bi
  WITH CREDIT_QUOTA = 4000
       FREQUENCY = MONTHLY
       START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 50  PERCENT DO NOTIFY
    ON 80  PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND
    ON 110 PERCENT DO SUSPEND_IMMEDIATE;

ALTER WAREHOUSE bi_prod SET RESOURCE_MONITOR = rm_prod_bi;
```

`SUSPEND_IMMEDIATE` kills running queries; reserve for hard caps. Pair with a CloudWatch metric alarm fed by a scheduled Lambda querying `WAREHOUSE_METERING_HISTORY`.

**Debug path: "dashboard suddenly slow":**

1. `SHOW WAREHOUSES` → confirm warehouse is `STARTED`. If `SUSPENDED` and a query is queued, that's normal cold-start.
2. `QUERY_HISTORY` for the dashboard's user/role over the last hour: filter on `EXECUTION_TIME > p95 baseline`.
3. Check `QUEUED_OVERLOAD_TIME` and `QUEUED_PROVISIONING_TIME` — high values indicate concurrency or scaling lag; bump `MIN_CLUSTER_COUNT` or warehouse size.
4. If a specific query slowed: `EXPLAIN USING TEXT` and inspect `Bytes scanned` / `Partitions scanned`. Add or improve a clustering key on the most-filtered columns.
5. Iceberg-only: confirm catalog refresh ran; cross-engine writers can leave the metadata stale.

**Clustering-key validation:**

```sql
-- Before declaring a clustering key, validate selectivity
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date, region)');
-- Look at average_overlaps and average_depth; aim for low values
```

## Pricing notes (always verify)

- Compute credits: ~$2-$4 each (Standard to Business Critical).
- Storage on Snowflake-managed: ~$23-$40/TB/month.
- Iceberg Tables: you pay S3 + S3 Tables management fee; Snowflake compute charges as normal.
- Cortex functions: per-token.
- Cortex Analyst / Search: metered separately.
- Set Resource Monitors with hard caps per warehouse; alert on 50% / 80% of monthly credit budget.

## Best practices

**Cost**

- Aggressive auto-suspend (1-5 min).
- Multi-cluster for concurrency, not always-on size-up.
- Iceberg on S3 for cold and semi-cold data.
- Streams + Tasks for incremental processing.
- Quarterly review of Cortex token usage.

**Performance**

- Clustering keys on frequent WHERE / JOIN columns.
- Result cache (free for 24 hours) — design dashboards to benefit.
- Search Optimization Service for selective point queries on internal tables.

**Governance**

- Horizon classification tags + dynamic data masking for PII.
- Lake Formation for Iceberg permissions on S3.
- DataZone for cross-tool catalog and marketplace.
- Audit logs to CloudWatch and Security Lake; retain per compliance regime.

**Reliability**

- Time Travel: retain 7-30 days; use for accidental-delete recovery.
- Fail-safe: 7 days; do not depend on it for operational recovery.
- Replication for DR across regions or clouds where justified.

## Related reading

- [`AWS Glue 5 + Apache Iceberg modern ETL`](/blog/aws-glue-5-apache-iceberg-modern-etl/)
- [`Athena query cost optimization: partition, compress, cache, Iceberg`](/blog/athena-query-cost-optimization-partition-compress-cache-iceberg/)
- [`Amazon Redshift Serverless vs provisioned: when to use each`](/blog/amazon-redshift-serverless-vs-provisioned-when-to-use-each/)

## Related services

- [AWS Data Analytics](/services/aws-data-analytics/)
- [Generative AI on AWS](/services/generative-ai-on-aws/)
- [AWS Architecture Review](/services/aws-architecture-review/)

## FAQ

### How does Snowflake run on AWS in 2026?
Snowflake is a managed SaaS data platform that runs on AWS infrastructure. You pick an AWS region during account creation; Snowflake operates the compute (virtual warehouses on EC2), object storage (S3), and services (metadata, security, optimizer) for you. In 2026 the noteworthy architectural shifts are: native Iceberg Tables so your actual data sits in your S3 / S3 Tables bucket (not inside Snowflake-managed storage), Snowpark Container Services for running arbitrary long-lived containers near the data, and Hybrid Tables for row-level transactional workloads on the same platform as analytical queries.

### When should I pick Snowflake vs Redshift vs Athena vs SageMaker Lakehouse on AWS?
Four rules of thumb. (1) **All-AWS, tight IAM integration, Redshift ML in mind, predictable workloads** — Redshift Serverless or Redshift provisioned is usually the cleaner choice and keeps data under native AWS governance. (2) **Pay-per-query on data already in S3, modest concurrency, cost-sensitive** — Amazon Athena + Iceberg on S3 Tables. (3) **Multi-cloud consumption, data sharing with external partners, many teams with wildly different SLAs, or an existing Snowflake estate** — Snowflake on AWS is usually worth the premium. (4) **You want a unified governance surface across lakehouse + ML + GenAI** — SageMaker Lakehouse + Amazon DataZone + Athena/Redshift is the 2026 AWS-native answer. In practice many enterprises run Snowflake AND Redshift/Athena side by side, with Iceberg as the interop layer.

### What is Cortex Analyst and how is it different from Cortex Search or Amazon Q?
Cortex Analyst (GA 2024) turns natural-language questions into safe, governed SQL over a semantic model you define — think "what was revenue in EMEA last quarter broken down by product line?" returning an answer and the SQL that produced it. Cortex Search (GA 2024) is a managed RAG/hybrid-search service over unstructured text columns. Cortex COMPLETE / EMBED_TEXT / CLASSIFY_TEXT / SUMMARIZE are per-token in-SQL LLM functions. Amazon Q in QuickSight / Amazon Q for Business is the AWS-native equivalent for NL-to-BI questions, grounded in QuickSight datasets and enterprise connectors. Both approaches converge: keep the answer engine close to the data, govern inputs and outputs, and never paste raw data into an external prompt.

### What are Iceberg Tables in Snowflake and why do they matter?
Iceberg Tables let Snowflake read and write Apache Iceberg datasets that live in your S3 bucket (or S3 Tables bucket) — you own the files, the schema, and the metadata. Benefits: (a) no data lock-in — Amazon Athena, Glue, EMR, Redshift, SageMaker, Flink, Spark, Trino, and Presto can read the same tables; (b) one governance boundary via AWS Lake Formation and/or Snowflake Horizon; (c) cheaper storage — S3/S3 Tables pricing instead of Snowflake-managed storage. Trade-offs: query latency on very hot workloads is slightly higher than Snowflake-native FDN tables, and some advanced Snowflake features (hybrid tables, search optimization) are only available on internal tables. The 2026 default we recommend for new analytics workloads is Iceberg first, Snowflake-native only where you need a specific Snowflake-only feature.

### What is the Polaris Catalog and how does it relate to AWS Glue Data Catalog?
Polaris Catalog (open-sourced by Snowflake in 2024 and now under the Apache Iceberg ecosystem) is an Iceberg REST Catalog that any Iceberg-compatible engine can read — Snowflake, Trino, Presto, Flink, Spark, Athena. It lets multiple engines agree on table metadata without forcing everyone through a Snowflake-specific API. AWS Glue Data Catalog already speaks Iceberg; in most AWS-centric shops the pragmatic 2026 pattern is Glue Data Catalog for AWS-native engines (Athena, EMR, Redshift) plus Snowflake via Iceberg REST / Polaris federation. Avoid running two catalogs with divergent metadata for the same tables — pick one authoritative catalog per dataset.

### What are Snowpark Container Services and Hybrid Tables?
**Snowpark Container Services** (GA 2024) lets you run long-lived containerized workloads (APIs, Streamlit apps, custom ML inference, third-party software) inside Snowflake, close to the data, with Snowflake handling scheduling and networking. Useful when data gravity makes moving data to an external runtime expensive or compliance-awkward. **Hybrid Tables** (GA 2024) combine row-level transactional access (OLTP-style) with analytical access in the same table — useful for unified metric layers, sharing state between an app and analytics, or small OLTP needs you would rather not run a separate RDS for. Compare with Amazon Aurora DSQL for strongly-consistent distributed SQL — different tool for a similar gap.

### What is Snowflake Horizon and how does it intersect with AWS governance?
Snowflake Horizon is Snowflake's governance and discovery surface: data classification (PII/PCI), access policies (row-level, column-level, dynamic data masking), audit, lineage, and quality. On AWS, layer it with Amazon DataZone for multi-domain federated governance, AWS Lake Formation for Iceberg-table permissions on S3/S3 Tables, and CloudTrail + Security Lake (OCSF) for unified evidence. The 2026 pattern is Horizon inside Snowflake + Lake Formation on the S3 side + DataZone as the cross-tool catalog and marketplace.

### What does Snowflake cost on AWS in 2026?
Ballparks — always verify at snowflake.com/pricing. Compute: ~$2-$4 per credit (Standard to Business Critical); 1 credit ≈ 1 XS node running for an hour. Storage (Snowflake-managed): ~$23-$40/TB/month — or pay S3 prices if you use Iceberg Tables on your bucket. Cortex AI functions: per-token. Cortex Analyst / Search: metered separately. Biggest 2026 optimization levers: (a) put cold and semi-cold data on Iceberg Tables (S3/S3 Tables) instead of Snowflake-managed; (b) aggressive auto-suspend (1-5 min for interactive warehouses); (c) multi-cluster warehouses for concurrency rather than one always-on XL; (d) Resource Monitors with budget alarms; (e) use Streams + Tasks for incremental, not full-table, processing; (f) review Cortex AI token usage monthly — it is easy to accidentally run SUMMARIZE over an entire table.

---

*Source: https://www.factualminds.com/integrations/snowflake-aws/*
