AWS Glossary

Amazon Redshift

Fully managed cloud data warehouse for running fast SQL analytics on petabyte-scale datasets.

Definition

Amazon Redshift is a fully managed cloud data warehouse optimized for running complex SQL analytics on datasets ranging from gigabytes to petabytes. Redshift uses columnar storage, massively parallel processing (MPP), and automatic compression to execute analytical queries orders of magnitude faster than row-based databases. It is used for business intelligence, reporting, and large-scale data analysis on AWS.

How Redshift Works

Columnar Storage:

Data is stored by column, not row — queries that read 3 of 100 columns only access 3% of stored data
Each column uses the most efficient compression algorithm for its data type automatically
Dramatically reduces I/O for typical analytical queries (SUM, AVG, GROUP BY, JOINs)

Massively Parallel Processing (MPP):

Data is distributed across multiple compute nodes
Each node processes its slice of data in parallel
Leader node receives queries, compiles an execution plan, and coordinates worker nodes
Scale out by adding nodes; query throughput scales linearly

Sort Keys:

Define how data is physically sorted on disk
Queries filtering on sort key columns skip unsorted blocks (zone maps)
Choose sort key based on most common WHERE clause columns

Redshift Serverless

Redshift Serverless automatically provisions and scales capacity based on query demand:

No cluster to manage — just connect and query
Pay per RPU-hour (Redshift Processing Unit) for actual query time
Scales from 8 to 512 RPUs in seconds
Automatic pause when idle (cost goes to near-zero)
Best for: intermittent analytics, variable workloads, teams without DBA resources

RA3 Instances (provisioned alternative):

Separate compute and storage scaling — storage in S3-managed Redshift Managed Storage
Best for: high-frequency queries where predictable performance and Savings Plans discounts matter

Redshift Spectrum

Query data directly in S3 without loading it into Redshift:

Define external tables pointing to S3 (Parquet, ORC, CSV, JSON, Avro)
Query S3 data and Redshift native tables in the same SQL statement
Compute scales automatically (Spectrum nodes separate from cluster nodes)
Used for: data lake queries, historical data not worth loading into Redshift, raw log analysis

Integrations

Amazon S3 / Data Lake:

COPY command loads data from S3 into Redshift in parallel (fastest ingestion method)
UNLOAD command exports query results to S3 as Parquet, CSV, etc.
Redshift Spectrum queries S3 without loading

Amazon QuickSight:

Direct Redshift integration for BI dashboards and reports
SPICE in-memory caching for fast dashboard loads without hitting Redshift on every query

AWS Glue / dbt:

AWS Glue Crawlers catalog Redshift tables in the AWS Glue Data Catalog
dbt (data build tool) is widely used for Redshift transformation pipelines

Streaming Ingestion:

Stream data from Amazon Kinesis Data Streams or MSK (Kafka) directly into Redshift
Near-real-time analytics without a landing zone in S3

Common Mistakes

Mistake 1: Not choosing the right distribution style. If large tables are not distributed to collocate join keys, Redshift redistributes data across nodes at query time — causing significant performance overhead. Use DISTKEY on frequently joined columns.

Mistake 2: Using Redshift for OLTP workloads. Redshift is optimized for read-heavy analytics, not transactional operations. Use RDS, Aurora, or DynamoDB for application transactions; use Redshift for analytical queries over that data.

Mistake 3: Skipping VACUUM and ANALYZE. Redshift uses soft deletes (deleted rows marked, not removed). Run VACUUM to reclaim space and re-sort data; run ANALYZE to update statistics. Enable automatic VACUUM and ANALYZE to avoid manual maintenance.

Amazon S3: Primary data lake source for Redshift Spectrum and COPY ingestion
AWS Glue: ETL service for loading and transforming data into Redshift
Amazon QuickSight: BI and visualization layer on top of Redshift
Amazon Kinesis Data Streams: Real-time data streaming into Redshift

AWS Data Analytics Services
AWS Cloud Migration — Netezza, Teradata, and on-prem warehouse migrations to Redshift with parallel-run validation
FinOps Consulting
AWS Well-Architected Review Checklist

Related Services

Aws Data Analytics

Explore this service offering

Learn more

Finops Consulting

Explore this service offering

Learn more

Need Help with This Topic?

Our AWS experts can help you implement and optimize these concepts for your organization.

Talk to AWS Experts

Amazon Redshift

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content