Cloud Data Platform
Snowflake on AWS
Snowflake on AWS with Iceberg interop, Cortex Analyst for natural-language BI, Hybrid Tables for OLTP, and Snowpark Container Services — governed by Horizon and Polaris.
Last updated:April 29, 2026Author:FactualMinds Cloud Integration TeamReviewed by:FactualMinds AWS-certified architects (Solutions Architect – Professional)
AI & assistant-friendly summary
This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.
Summary
Snowflake + AWS in 2026: Cortex Analyst, Iceberg Tables on S3, Hybrid Tables, Snowpark, Polaris Catalog — vs Redshift, Athena, SageMaker Lakehouse.
Key Facts
- • Snowflake + AWS in 2026: Cortex Analyst, Iceberg Tables on S3, Hybrid Tables, Snowpark, Polaris Catalog — vs Redshift, Athena, SageMaker Lakehouse
- • Snowflake on AWS with Iceberg interop, Cortex Analyst for natural-language BI, Hybrid Tables for OLTP, and Snowpark Container Services — governed by Horizon and Polaris
- • How does Snowflake run on AWS in 2026
- • Snowflake is a managed SaaS data platform that runs on AWS infrastructure
- • You pick an AWS region during account creation; Snowflake operates the compute (virtual warehouses on EC2), object storage (S3), and services (metadata, security, optimizer) for you
Entity Definitions
- Bedrock
- Bedrock is relevant to snowflake on aws.
- SageMaker
- SageMaker is relevant to snowflake on aws.
- Lambda
- Lambda is relevant to snowflake on aws.
- EC2
- EC2 is relevant to snowflake on aws.
- S3
- S3 is relevant to snowflake on aws.
- RDS
- RDS is relevant to snowflake on aws.
- Aurora
- Aurora is relevant to snowflake on aws.
- Amazon Aurora
- Amazon Aurora is relevant to snowflake on aws.
- DynamoDB
- DynamoDB is relevant to snowflake on aws.
- CloudWatch
- CloudWatch is relevant to snowflake on aws.
- IAM
- IAM is relevant to snowflake on aws.
- Glue
- Glue is relevant to snowflake on aws.
- AWS Glue
- AWS Glue is relevant to snowflake on aws.
- Athena
- Athena is relevant to snowflake on aws.
- Amazon Athena
- Amazon Athena is relevant to snowflake on aws.
## Snowflake on AWS
Snowflake is a SaaS data platform that runs on AWS infrastructure. It separates compute and storage, auto-scales, and has steadily expanded from a pure data warehouse into a broader developer platform — Snowpark for Python/Java/Scala in-database, Cortex AI for in-SQL LLM functions, Snowpark Container Services for long-lived containers near the data, Iceberg Tables for lakehouse interop, Hybrid Tables for OLTP-style workloads, and Horizon for governance.
In 2026 the question is less "Snowflake vs Redshift" and more "how do I compose Snowflake alongside my AWS-native analytics stack without duplicating data or governance". The answer almost always involves Apache Iceberg as the interop layer.
## What's new for Snowflake on AWS in 2026
- **Iceberg Tables GA** — native read/write of Iceberg tables sitting on your S3 / S3 Tables bucket. Your data, your catalog.
- **Cortex Analyst** — natural-language-to-SQL over a governed semantic model.
- **Cortex Search** — managed hybrid search (BM25 + vector) over unstructured columns.
- **Hybrid Tables** — row-level transactional access alongside analytical queries.
- **Snowpark Container Services** — run long-lived containers inside Snowflake, close to the data.
- **Polaris Catalog** — open-source Iceberg REST catalog; interop with Athena, Trino, Flink, Spark.
- **Horizon governance** — classification, access policies, lineage, quality, and audit in one console.
- **Document AI maturity** — ML-powered extraction from PDFs and images stored in Snowflake stages (S3).
- **Prompt caching for Cortex** and model-routing between Snowflake Arctic and third-party models via Cortex COMPLETE.
## Why Snowflake on AWS
- **Operational simplicity** — no infrastructure, automatic scaling, minimal tuning.
- **Compute / storage separation** — add data without adding compute; spin warehouses up/down per team.
- **Data sharing** — live sharing across Snowflake accounts and the Snowflake Marketplace without ETL.
- **Multi-cloud consumption** — same dataset accessible from Snowflake accounts in AWS, Azure, and GCP.
- **AI-native features** — Cortex puts LLM capabilities and NL-to-SQL one SQL function call away from the data.
## Architectural building blocks
### Virtual warehouses
- XS to 6XL, plus multi-cluster for concurrency.
- Auto-suspend / auto-resume; set suspend to 1-5 min for interactive workloads.
- Separate warehouses per team prevent contention; governance via Horizon.
### Storage options
- **Snowflake-managed FDN tables** — proprietary columnar format; best latency and full feature coverage.
- **Iceberg Tables** — Apache Iceberg on your S3 / S3 Tables bucket; interop with Athena, EMR, Redshift, Glue, SageMaker. Preferred default for new analytics data in 2026.
- **External Tables** — read-only over S3; useful for one-off or ad-hoc queries on lake data.
### Governance
- **Snowflake Horizon** for classification, access policies, lineage, quality, audit.
- **Amazon DataZone** for cross-tool catalog and marketplace.
- **AWS Lake Formation** for Iceberg permissions on S3.
- **Security Hub + Security Lake (OCSF)** for aggregated security evidence.
### Developer layer
- **Snowpark** Python/Java/Scala dataframes; runs inside Snowflake, no data movement.
- **Snowpark ML** for scikit-learn/XGBoost training on Snowflake data.
- **Streamlit in Snowflake** for data apps without a separate hosting layer.
- **Snowpark Container Services** for APIs, Streamlit, custom ML inference, or third-party software.
## Cortex AI in practice
```sql
-- NL-to-SQL via Cortex Analyst over a semantic model
SELECT SNOWFLAKE.CORTEX.ANALYST(
'What was EMEA revenue last quarter by product line?',
semantic_model => 'sales_analytics'
);
-- Hybrid search (BM25 + vector) over support tickets
SELECT ticket_id, snippet
FROM TABLE(SNOWFLAKE.CORTEX.SEARCH(
service_name => 'support_search',
query => 'checkout failing with 402 on EU cards'
));
-- Embed + summarize + classify inline
SELECT order_id,
SNOWFLAKE.CORTEX.SUMMARIZE(notes) AS summary,
SNOWFLAKE.CORTEX.CLASSIFY_TEXT(notes, ['refund', 'retention', 'new_sale']) AS intent
FROM sales_notes;
```
Models: **Snowflake Arctic**, **Llama 3 / 4**, **Mistral Large 2**, **Claude Sonnet 4** (where region-available), and others. Per-token billing — monitor monthly.
## Iceberg Tables: the 2026 default
Iceberg Tables put the data in your S3 / S3 Tables bucket, with Snowflake as one compute engine among several.
- Shared with **Amazon Athena** for pay-per-query exploration.
- Shared with **AWS Glue / EMR / Spark** for ETL and ML feature engineering.
- Shared with **Amazon Redshift** via Spectrum or Lake Formation.
- Shared with **SageMaker Lakehouse** for ML training and inference pipelines.
- Catalog via **AWS Glue Data Catalog** (preferred for AWS-native engines) or **Polaris** (for multi-engine, cross-vendor).
Reserve Snowflake-native FDN tables for:
- Hot, concurrency-heavy BI workloads with strict latency targets.
- Features not yet available on Iceberg (Hybrid Tables, search optimization service, some materialized views).
## Loading data from AWS
- **COPY INTO from S3** for one-time or batch loads.
- **Snowpipe + S3 event notifications** for continuous ingestion.
- **AWS Glue / Fivetran / Airbyte** for source-system ETL (RDS, DynamoDB, SaaS).
- **Snowflake Kafka Connector** for Amazon MSK streaming.
- **Streams + Tasks** for change-data-capture-style incremental processing inside Snowflake.
## When Snowflake is NOT the right call
- Small team, AWS-only stack, predictable analytics workload — **Amazon Redshift Serverless** or **Athena + Iceberg** usually wins on cost and governance simplicity.
- You need strict AWS-native IAM on every row — Redshift + Lake Formation integrates more tightly than Snowflake''s external IAM.
- Pure ML training workloads over lake data — **SageMaker + S3 + Glue** can skip the Snowflake bill entirely.
- Compliance team requires data to never leave AWS-managed storage — reduce Snowflake surface by preferring Iceberg Tables on S3 (data stays in S3) or choose a fully AWS-native stack.
## Snowflake vs Redshift vs Athena vs SageMaker Lakehouse
| Dimension | Snowflake | Redshift Serverless | Athena + Iceberg | SageMaker Lakehouse |
| -------------------- | ---------------------------- | ------------------------------------ | -------------------------- | ---------------------------- |
| Operating model | SaaS | AWS-managed | Serverless, pay-per-query | Unified lakehouse |
| Storage | Managed or Iceberg on S3 | Managed or Spectrum on S3 | S3 / S3 Tables | S3 + governance via DataZone |
| Concurrency | Strong (multi-cluster) | Strong | Moderate | Varies by engine |
| IAM integration | External; Horizon + IdC SAML | Native IAM | Native IAM | Native IAM |
| NL-to-SQL | Cortex Analyst | Amazon Q in QuickSight | Amazon Q via Athena | Amazon Q + DataZone |
| In-DB ML | Snowpark ML | Redshift ML + Bedrock | SageMaker external | SageMaker native |
| Data sharing | Snowflake Marketplace | Cross-account, Redshift Data Sharing | S3 + Lake Formation grants | DataZone projects |
| Multi-cloud | Yes | No | No | No |
| Typical cost profile | Premium; elastic | Moderate; predictable | Pay-per-TB scanned | Workload-dependent |
## Failure modes & resilience
**1. Warehouse cold-start latency.** Auto-suspend at 1 min is great for cost but adds 1–10 s on the first query against a cold warehouse. For latency-sensitive dashboards, set auto-suspend to 60 min on the BI warehouse, or use a "keep-warm" scheduled task that runs `SELECT 1` every 30 s on the dashboard warehouse during business hours.
**2. Multi-cluster scaling lag.** When concurrency exceeds a single cluster, Snowflake adds clusters — but provisioning takes 10–30 s. During a flash-load (campaign launch, scheduled job convoy), early queries queue. Mitigation: set `MIN_CLUSTER_COUNT = 2` for predictable peak warehouses; use `SCALING_POLICY = STANDARD` for gradual, `ECONOMY` for cost-sensitive.
**3. Iceberg metadata stale-read across engines.** When Athena, Glue, or EMR writes to an Iceberg table also read by Snowflake, the catalog must be refreshed. Snowflake Iceberg tables managed by Glue Data Catalog auto-refresh at query time; externally-cataloged tables require explicit `ALTER ICEBERG TABLE … REFRESH`. Symptom: missing rows that exist on S3. Mitigation: prefer single-writer-per-table or use the Polaris REST catalog for multi-engine writes.
**4. Cortex token quota cliffs.** Per-account daily token budgets exist for Cortex Analyst, Search, and COMPLETE; hitting them returns `function evaluation failed`. Track via `SNOWFLAKE.CORTEX_FUNCTIONS_USAGE_HISTORY`. Mitigation: rate-limit at the application tier; set Resource Monitors on warehouses running Cortex queries.
**5. Time Travel + Fail-safe boundaries.** Time Travel (default 1 day, up to 90 on Enterprise) covers operational restore. Fail-safe (7 days, automatic) is Snowflake-only recovery — you cannot self-serve from it. For accidental DROP TABLE recovery, use `UNDROP TABLE` within Time Travel; beyond Time Travel, open a Snowflake support ticket within 7 days.
**6. Iceberg storage on S3 gets expensive without compaction.** Frequent small writes create thousands of tiny files. S3 Tables auto-compacts; self-managed S3 buckets need scheduled `OPTIMIZE` runs (Spark/Glue). Symptom: queries scan many small files; per-query cost climbs.
**7. Long-running query timeout.** Default `STATEMENT_TIMEOUT_IN_SECONDS = 172800` (48h). For interactive workloads, lower to 600 s on the BI warehouse so runaway queries don't accumulate credits. Combine with `STATEMENT_QUEUED_TIMEOUT_IN_SECONDS` to fail fast under contention.
## Observability runbook
**ACCOUNT_USAGE views to monitor:**
| View | Use for |
| -------------------------------- | ------------------------------------------------------- |
| `WAREHOUSE_METERING_HISTORY` | Daily credit spend per warehouse; trending overruns |
| `QUERY_HISTORY` | Slowest queries, queue time, partitions scanned |
| `QUERY_ACCELERATION_HISTORY` | Whether QAS is helping the workloads it was enabled for |
| `LOGIN_HISTORY` | Auth anomalies; pair with CloudTrail for cross-evidence |
| `ACCESS_HISTORY` | Lineage; which roles read which columns (governance) |
| `CORTEX_FUNCTIONS_USAGE_HISTORY` | Per-function token spend; budget overruns |
**Resource monitors with hard stops:**
```sql
CREATE OR REPLACE RESOURCE MONITOR rm_prod_bi
WITH CREDIT_QUOTA = 4000
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
TRIGGERS
ON 50 PERCENT DO NOTIFY
ON 80 PERCENT DO NOTIFY
ON 100 PERCENT DO SUSPEND
ON 110 PERCENT DO SUSPEND_IMMEDIATE;
ALTER WAREHOUSE bi_prod SET RESOURCE_MONITOR = rm_prod_bi;
```
`SUSPEND_IMMEDIATE` kills running queries; reserve for hard caps. Pair with a CloudWatch metric alarm fed by a scheduled Lambda querying `WAREHOUSE_METERING_HISTORY`.
**Debug path: "dashboard suddenly slow":**
1. `SHOW WAREHOUSES` → confirm warehouse is `STARTED`. If `SUSPENDED` and a query is queued, that's normal cold-start.
2. `QUERY_HISTORY` for the dashboard's user/role over the last hour: filter on `EXECUTION_TIME > p95 baseline`.
3. Check `QUEUED_OVERLOAD_TIME` and `QUEUED_PROVISIONING_TIME` — high values indicate concurrency or scaling lag; bump `MIN_CLUSTER_COUNT` or warehouse size.
4. If a specific query slowed: `EXPLAIN USING TEXT` and inspect `Bytes scanned` / `Partitions scanned`. Add or improve a clustering key on the most-filtered columns.
5. Iceberg-only: confirm catalog refresh ran; cross-engine writers can leave the metadata stale.
**Clustering-key validation:**
```sql
-- Before declaring a clustering key, validate selectivity
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date, region)');
-- Look at average_overlaps and average_depth; aim for low values
```
## Pricing notes (always verify)
- Compute credits: ~$2-$4 each (Standard to Business Critical).
- Storage on Snowflake-managed: ~$23-$40/TB/month.
- Iceberg Tables: you pay S3 + S3 Tables management fee; Snowflake compute charges as normal.
- Cortex functions: per-token.
- Cortex Analyst / Search: metered separately.
- Set Resource Monitors with hard caps per warehouse; alert on 50% / 80% of monthly credit budget.
## Best practices
**Cost**
- Aggressive auto-suspend (1-5 min).
- Multi-cluster for concurrency, not always-on size-up.
- Iceberg on S3 for cold and semi-cold data.
- Streams + Tasks for incremental processing.
- Quarterly review of Cortex token usage.
**Performance**
- Clustering keys on frequent WHERE / JOIN columns.
- Result cache (free for 24 hours) — design dashboards to benefit.
- Search Optimization Service for selective point queries on internal tables.
**Governance**
- Horizon classification tags + dynamic data masking for PII.
- Lake Formation for Iceberg permissions on S3.
- DataZone for cross-tool catalog and marketplace.
- Audit logs to CloudWatch and Security Lake; retain per compliance regime.
**Reliability**
- Time Travel: retain 7-30 days; use for accidental-delete recovery.
- Fail-safe: 7 days; do not depend on it for operational recovery.
- Replication for DR across regions or clouds where justified.
## Related reading
- [`AWS Glue 5 + Apache Iceberg modern ETL`](/blog/aws-glue-5-apache-iceberg-modern-etl/)
- [`Athena query cost optimization: partition, compress, cache, Iceberg`](/blog/athena-query-cost-optimization-partition-compress-cache-iceberg/)
- [`Amazon Redshift Serverless vs provisioned: when to use each`](/blog/amazon-redshift-serverless-vs-provisioned-when-to-use-each/)
## Related services
- [AWS Data Analytics](/services/aws-data-analytics/)
- [Generative AI on AWS](/services/generative-ai-on-aws/)
- [AWS Architecture Review](/services/aws-architecture-review/) Snowflake on AWS
Snowflake is a SaaS data platform that runs on AWS infrastructure. It separates compute and storage, auto-scales, and has steadily expanded from a pure data warehouse into a broader developer platform — Snowpark for Python/Java/Scala in-database, Cortex AI for in-SQL LLM functions, Snowpark Container Services for long-lived containers near the data, Iceberg Tables for lakehouse interop, Hybrid Tables for OLTP-style workloads, and Horizon for governance.
In 2026 the question is less “Snowflake vs Redshift” and more “how do I compose Snowflake alongside my AWS-native analytics stack without duplicating data or governance”. The answer almost always involves Apache Iceberg as the interop layer.
What’s new for Snowflake on AWS in 2026
- Iceberg Tables GA — native read/write of Iceberg tables sitting on your S3 / S3 Tables bucket. Your data, your catalog.
- Cortex Analyst — natural-language-to-SQL over a governed semantic model.
- Cortex Search — managed hybrid search (BM25 + vector) over unstructured columns.
- Hybrid Tables — row-level transactional access alongside analytical queries.
- Snowpark Container Services — run long-lived containers inside Snowflake, close to the data.
- Polaris Catalog — open-source Iceberg REST catalog; interop with Athena, Trino, Flink, Spark.
- Horizon governance — classification, access policies, lineage, quality, and audit in one console.
- Document AI maturity — ML-powered extraction from PDFs and images stored in Snowflake stages (S3).
- Prompt caching for Cortex and model-routing between Snowflake Arctic and third-party models via Cortex COMPLETE.
Why Snowflake on AWS
- Operational simplicity — no infrastructure, automatic scaling, minimal tuning.
- Compute / storage separation — add data without adding compute; spin warehouses up/down per team.
- Data sharing — live sharing across Snowflake accounts and the Snowflake Marketplace without ETL.
- Multi-cloud consumption — same dataset accessible from Snowflake accounts in AWS, Azure, and GCP.
- AI-native features — Cortex puts LLM capabilities and NL-to-SQL one SQL function call away from the data.
Architectural building blocks
Virtual warehouses
- XS to 6XL, plus multi-cluster for concurrency.
- Auto-suspend / auto-resume; set suspend to 1-5 min for interactive workloads.
- Separate warehouses per team prevent contention; governance via Horizon.
Storage options
- Snowflake-managed FDN tables — proprietary columnar format; best latency and full feature coverage.
- Iceberg Tables — Apache Iceberg on your S3 / S3 Tables bucket; interop with Athena, EMR, Redshift, Glue, SageMaker. Preferred default for new analytics data in 2026.
- External Tables — read-only over S3; useful for one-off or ad-hoc queries on lake data.
Governance
- Snowflake Horizon for classification, access policies, lineage, quality, audit.
- Amazon DataZone for cross-tool catalog and marketplace.
- AWS Lake Formation for Iceberg permissions on S3.
- Security Hub + Security Lake (OCSF) for aggregated security evidence.
Developer layer
- Snowpark Python/Java/Scala dataframes; runs inside Snowflake, no data movement.
- Snowpark ML for scikit-learn/XGBoost training on Snowflake data.
- Streamlit in Snowflake for data apps without a separate hosting layer.
- Snowpark Container Services for APIs, Streamlit, custom ML inference, or third-party software.
Cortex AI in practice
-- NL-to-SQL via Cortex Analyst over a semantic model
SELECT SNOWFLAKE.CORTEX.ANALYST(
'What was EMEA revenue last quarter by product line?',
semantic_model => 'sales_analytics'
);
-- Hybrid search (BM25 + vector) over support tickets
SELECT ticket_id, snippet
FROM TABLE(SNOWFLAKE.CORTEX.SEARCH(
service_name => 'support_search',
query => 'checkout failing with 402 on EU cards'
));
-- Embed + summarize + classify inline
SELECT order_id,
SNOWFLAKE.CORTEX.SUMMARIZE(notes) AS summary,
SNOWFLAKE.CORTEX.CLASSIFY_TEXT(notes, ['refund', 'retention', 'new_sale']) AS intent
FROM sales_notes;
Models: Snowflake Arctic, Llama 3 / 4, Mistral Large 2, Claude Sonnet 4 (where region-available), and others. Per-token billing — monitor monthly.
Iceberg Tables: the 2026 default
Iceberg Tables put the data in your S3 / S3 Tables bucket, with Snowflake as one compute engine among several.
- Shared with Amazon Athena for pay-per-query exploration.
- Shared with AWS Glue / EMR / Spark for ETL and ML feature engineering.
- Shared with Amazon Redshift via Spectrum or Lake Formation.
- Shared with SageMaker Lakehouse for ML training and inference pipelines.
- Catalog via AWS Glue Data Catalog (preferred for AWS-native engines) or Polaris (for multi-engine, cross-vendor).
Reserve Snowflake-native FDN tables for:
- Hot, concurrency-heavy BI workloads with strict latency targets.
- Features not yet available on Iceberg (Hybrid Tables, search optimization service, some materialized views).
Loading data from AWS
- COPY INTO from S3 for one-time or batch loads.
- Snowpipe + S3 event notifications for continuous ingestion.
- AWS Glue / Fivetran / Airbyte for source-system ETL (RDS, DynamoDB, SaaS).
- Snowflake Kafka Connector for Amazon MSK streaming.
- Streams + Tasks for change-data-capture-style incremental processing inside Snowflake.
When Snowflake is NOT the right call
- Small team, AWS-only stack, predictable analytics workload — Amazon Redshift Serverless or Athena + Iceberg usually wins on cost and governance simplicity.
- You need strict AWS-native IAM on every row — Redshift + Lake Formation integrates more tightly than Snowflake”s external IAM.
- Pure ML training workloads over lake data — SageMaker + S3 + Glue can skip the Snowflake bill entirely.
- Compliance team requires data to never leave AWS-managed storage — reduce Snowflake surface by preferring Iceberg Tables on S3 (data stays in S3) or choose a fully AWS-native stack.
Snowflake vs Redshift vs Athena vs SageMaker Lakehouse
| Dimension | Snowflake | Redshift Serverless | Athena + Iceberg | SageMaker Lakehouse |
|---|---|---|---|---|
| Operating model | SaaS | AWS-managed | Serverless, pay-per-query | Unified lakehouse |
| Storage | Managed or Iceberg on S3 | Managed or Spectrum on S3 | S3 / S3 Tables | S3 + governance via DataZone |
| Concurrency | Strong (multi-cluster) | Strong | Moderate | Varies by engine |
| IAM integration | External; Horizon + IdC SAML | Native IAM | Native IAM | Native IAM |
| NL-to-SQL | Cortex Analyst | Amazon Q in QuickSight | Amazon Q via Athena | Amazon Q + DataZone |
| In-DB ML | Snowpark ML | Redshift ML + Bedrock | SageMaker external | SageMaker native |
| Data sharing | Snowflake Marketplace | Cross-account, Redshift Data Sharing | S3 + Lake Formation grants | DataZone projects |
| Multi-cloud | Yes | No | No | No |
| Typical cost profile | Premium; elastic | Moderate; predictable | Pay-per-TB scanned | Workload-dependent |
Failure modes & resilience
1. Warehouse cold-start latency. Auto-suspend at 1 min is great for cost but adds 1–10 s on the first query against a cold warehouse. For latency-sensitive dashboards, set auto-suspend to 60 min on the BI warehouse, or use a “keep-warm” scheduled task that runs SELECT 1 every 30 s on the dashboard warehouse during business hours.
2. Multi-cluster scaling lag. When concurrency exceeds a single cluster, Snowflake adds clusters — but provisioning takes 10–30 s. During a flash-load (campaign launch, scheduled job convoy), early queries queue. Mitigation: set MIN_CLUSTER_COUNT = 2 for predictable peak warehouses; use SCALING_POLICY = STANDARD for gradual, ECONOMY for cost-sensitive.
3. Iceberg metadata stale-read across engines. When Athena, Glue, or EMR writes to an Iceberg table also read by Snowflake, the catalog must be refreshed. Snowflake Iceberg tables managed by Glue Data Catalog auto-refresh at query time; externally-cataloged tables require explicit ALTER ICEBERG TABLE … REFRESH. Symptom: missing rows that exist on S3. Mitigation: prefer single-writer-per-table or use the Polaris REST catalog for multi-engine writes.
4. Cortex token quota cliffs. Per-account daily token budgets exist for Cortex Analyst, Search, and COMPLETE; hitting them returns function evaluation failed. Track via SNOWFLAKE.CORTEX_FUNCTIONS_USAGE_HISTORY. Mitigation: rate-limit at the application tier; set Resource Monitors on warehouses running Cortex queries.
5. Time Travel + Fail-safe boundaries. Time Travel (default 1 day, up to 90 on Enterprise) covers operational restore. Fail-safe (7 days, automatic) is Snowflake-only recovery — you cannot self-serve from it. For accidental DROP TABLE recovery, use UNDROP TABLE within Time Travel; beyond Time Travel, open a Snowflake support ticket within 7 days.
6. Iceberg storage on S3 gets expensive without compaction. Frequent small writes create thousands of tiny files. S3 Tables auto-compacts; self-managed S3 buckets need scheduled OPTIMIZE runs (Spark/Glue). Symptom: queries scan many small files; per-query cost climbs.
7. Long-running query timeout. Default STATEMENT_TIMEOUT_IN_SECONDS = 172800 (48h). For interactive workloads, lower to 600 s on the BI warehouse so runaway queries don’t accumulate credits. Combine with STATEMENT_QUEUED_TIMEOUT_IN_SECONDS to fail fast under contention.
Observability runbook
ACCOUNT_USAGE views to monitor:
| View | Use for |
|---|---|
WAREHOUSE_METERING_HISTORY | Daily credit spend per warehouse; trending overruns |
QUERY_HISTORY | Slowest queries, queue time, partitions scanned |
QUERY_ACCELERATION_HISTORY | Whether QAS is helping the workloads it was enabled for |
LOGIN_HISTORY | Auth anomalies; pair with CloudTrail for cross-evidence |
ACCESS_HISTORY | Lineage; which roles read which columns (governance) |
CORTEX_FUNCTIONS_USAGE_HISTORY | Per-function token spend; budget overruns |
Resource monitors with hard stops:
CREATE OR REPLACE RESOURCE MONITOR rm_prod_bi
WITH CREDIT_QUOTA = 4000
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
TRIGGERS
ON 50 PERCENT DO NOTIFY
ON 80 PERCENT DO NOTIFY
ON 100 PERCENT DO SUSPEND
ON 110 PERCENT DO SUSPEND_IMMEDIATE;
ALTER WAREHOUSE bi_prod SET RESOURCE_MONITOR = rm_prod_bi;
SUSPEND_IMMEDIATE kills running queries; reserve for hard caps. Pair with a CloudWatch metric alarm fed by a scheduled Lambda querying WAREHOUSE_METERING_HISTORY.
Debug path: “dashboard suddenly slow”:
SHOW WAREHOUSES→ confirm warehouse isSTARTED. IfSUSPENDEDand a query is queued, that’s normal cold-start.QUERY_HISTORYfor the dashboard’s user/role over the last hour: filter onEXECUTION_TIME > p95 baseline.- Check
QUEUED_OVERLOAD_TIMEandQUEUED_PROVISIONING_TIME— high values indicate concurrency or scaling lag; bumpMIN_CLUSTER_COUNTor warehouse size. - If a specific query slowed:
EXPLAIN USING TEXTand inspectBytes scanned/Partitions scanned. Add or improve a clustering key on the most-filtered columns. - Iceberg-only: confirm catalog refresh ran; cross-engine writers can leave the metadata stale.
Clustering-key validation:
-- Before declaring a clustering key, validate selectivity
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date, region)');
-- Look at average_overlaps and average_depth; aim for low values
Pricing notes (always verify)
- Compute credits: ~$2-$4 each (Standard to Business Critical).
- Storage on Snowflake-managed: ~$23-$40/TB/month.
- Iceberg Tables: you pay S3 + S3 Tables management fee; Snowflake compute charges as normal.
- Cortex functions: per-token.
- Cortex Analyst / Search: metered separately.
- Set Resource Monitors with hard caps per warehouse; alert on 50% / 80% of monthly credit budget.
Best practices
Cost
- Aggressive auto-suspend (1-5 min).
- Multi-cluster for concurrency, not always-on size-up.
- Iceberg on S3 for cold and semi-cold data.
- Streams + Tasks for incremental processing.
- Quarterly review of Cortex token usage.
Performance
- Clustering keys on frequent WHERE / JOIN columns.
- Result cache (free for 24 hours) — design dashboards to benefit.
- Search Optimization Service for selective point queries on internal tables.
Governance
- Horizon classification tags + dynamic data masking for PII.
- Lake Formation for Iceberg permissions on S3.
- DataZone for cross-tool catalog and marketplace.
- Audit logs to CloudWatch and Security Lake; retain per compliance regime.
Reliability
- Time Travel: retain 7-30 days; use for accidental-delete recovery.
- Fail-safe: 7 days; do not depend on it for operational recovery.
- Replication for DR across regions or clouds where justified.
Related reading
AWS Glue 5 + Apache Iceberg modern ETLAthena query cost optimization: partition, compress, cache, IcebergAmazon Redshift Serverless vs provisioned: when to use each
Related services
Tools & Calculators
Self-serve calculators and assessments that pair with this integration.
AWS Data Analytics
Architect Snowflake alongside your S3 lakehouse, Redshift, Athena, and SageMaker Lakehouse.
Related AWS Services
Consulting engagements that frequently pair with this integration.
AWS Data Analytics Services — Glue, Athena & QuickSight
AWS data analytics services — scalable data warehouse, ETL/ELT pipelines, real-time analytics, and business intelligence.
Generative AI on AWS — Production-Ready LLM Apps in Weeks
Generative AI on AWS — Amazon Bedrock, SageMaker, RAG pipelines, agents, and LLM application development.
AWS Well-Architected Review — Free Assessment
Free AWS Well-Architected Review from FactualMinds. Identify risks, compliance gaps, and optimization opportunities.
Who typically runs this integration?
The roles that most often own or review this stack.
AWS Solutions for CTOs
Cloud strategy, multi-account governance, agentic AI platform decisions, and FinOps culture for technology leaders scaling AWS in 2026 and beyond.
AWS Solutions for FinOps Teams
FinOps Framework 2025 rollout, AI unit economics, CUR 2.0 with Split Cost Allocation, and Bedrock cost controls for cloud finance leaders on AWS.
Related Integrations
Other AWS integration guides commonly deployed alongside this one.
MongoDB with AWS
MongoDB Atlas on AWS in 2026: MongoDB 8.0, Vector Search GA, Stream Processing, Queryable Encryption, Edge Server — vs DynamoDB, OpenSearch, pgvector.
Datadog with AWS
Datadog on AWS in 2026: unified observability for CloudWatch, EKS, Lambda, Bedrock LLM workloads, and security posture across multi-cloud estates.
Frequently Asked Questions
How does Snowflake run on AWS in 2026?
When should I pick Snowflake vs Redshift vs Athena vs SageMaker Lakehouse on AWS?
What is Cortex Analyst and how is it different from Cortex Search or Amazon Q?
What are Iceberg Tables in Snowflake and why do they matter?
What is the Polaris Catalog and how does it relate to AWS Glue Data Catalog?
What are Snowpark Container Services and Hybrid Tables?
What is Snowflake Horizon and how does it intersect with AWS governance?
What does Snowflake cost on AWS in 2026?
Related Reading
- AWS Glue 5: Modern ETL with Apache Iceberg — Tables, Time Travel, and Lakehouse Patterns
AWS Glue 5.1 brings Apache Iceberg 1.10.0, Spark 3.5.6, and Delta Lake 3.3.2. Here is how to use these together to build a production lakehouse on AWS — with time travel, ACID transactions, and schema evolution.
- Amazon Athena Cost Optimization: Partition Pruning, Compression, and Iceberg Tables
Athena charges per TB of data scanned. The right partitioning, compression, and table format can cut your Athena bill by 90%. Here is exactly how to do it.
- Amazon Redshift Serverless vs Provisioned: Which Is Right for Your Workload?
Redshift Serverless removes cluster management but is not always cheaper. Here is exactly when to choose Serverless, when to stay Provisioned, and how to calculate the cost difference.
Need Help with This Integration?
Our AWS-certified engineers can design, implement, and operate this integration end-to-end — or review what you already have.