Skip to main content

AWS Glossary

Amazon Redshift

Fully managed cloud data warehouse for running fast SQL analytics on petabyte-scale datasets.

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Fully managed cloud data warehouse for running fast SQL analytics on petabyte-scale datasets.

Key Facts

  • Definition Amazon Redshift is a fully managed cloud data warehouse optimized for running complex SQL analytics on datasets ranging from gigabytes to petabytes
  • It is used for business intelligence, reporting, and large-scale data analysis on AWS
  • Mistake 2:** Using Redshift for OLTP workloads
  • Use RDS, Aurora, or DynamoDB for application transactions; use Redshift for analytical queries over that data
  • Mistake 3:** Skipping VACUUM and ANALYZE

Entity Definitions

S3
S3 is an AWS service relevant to amazon redshift.
Amazon S3
Amazon S3 is an AWS service relevant to amazon redshift.
RDS
RDS is an AWS service relevant to amazon redshift.
Aurora
Aurora is an AWS service relevant to amazon redshift.
DynamoDB
DynamoDB is an AWS service relevant to amazon redshift.
Glue
Glue is an AWS service relevant to amazon redshift.
AWS Glue
AWS Glue is an AWS service relevant to amazon redshift.
QuickSight
QuickSight is an AWS service relevant to amazon redshift.
Amazon QuickSight
Amazon QuickSight is an AWS service relevant to amazon redshift.
serverless
serverless is a cloud computing concept relevant to amazon redshift.

Related Content

Definition

Amazon Redshift is a fully managed cloud data warehouse optimized for running complex SQL analytics on datasets ranging from gigabytes to petabytes. Redshift uses columnar storage, massively parallel processing (MPP), and automatic compression to execute analytical queries orders of magnitude faster than row-based databases. It is used for business intelligence, reporting, and large-scale data analysis on AWS.

How Redshift Works

Columnar Storage:

Massively Parallel Processing (MPP):

Sort Keys:

Redshift Serverless

Redshift Serverless automatically provisions and scales capacity based on query demand:

RA3 Instances (provisioned alternative):

Redshift Spectrum

Query data directly in S3 without loading it into Redshift:

Integrations

Amazon S3 / Data Lake:

Amazon QuickSight:

AWS Glue / dbt:

Streaming Ingestion:

Common Mistakes

Mistake 1: Not choosing the right distribution style. If large tables are not distributed to collocate join keys, Redshift redistributes data across nodes at query time — causing significant performance overhead. Use DISTKEY on frequently joined columns.

Mistake 2: Using Redshift for OLTP workloads. Redshift is optimized for read-heavy analytics, not transactional operations. Use RDS, Aurora, or DynamoDB for application transactions; use Redshift for analytical queries over that data.

Mistake 3: Skipping VACUUM and ANALYZE. Redshift uses soft deletes (deleted rows marked, not removed). Run VACUUM to reclaim space and re-sort data; run ANALYZE to update statistics. Enable automatic VACUUM and ANALYZE to avoid manual maintenance.

Need Help with This Topic?

Our AWS experts can help you implement and optimize these concepts for your organization.