AWS Glossary

Amazon S3 Vectors

S3 Vectors is the AWS native vector store — purpose-built vector storage on S3 with up to 90% lower cost than dedicated vector databases for RAG workloads.

Last reviewed: July 2026

Amazon S3 Vectors Bedrock Knowledge Bases vector stores June 2026: 10,000 results per query

AWS lifecycle notice (June 30, 2026) — Amazon Kendra is in maintenance for new customers after July 30, 2026. Evaluate Quick Index or Bedrock Knowledge Bases instead. Full matrix: lifecycle roundup.

Definition

Amazon S3 Vectors is a native vector storage tier on S3 for embeddings and similarity search. Vector buckets store high-dimensional vectors with metadata filters; indexes support cosine, Euclidean, and dot-product distance metrics. S3 Vectors reached GA in 2025 as a Bedrock Knowledge Bases vector store option alongside OpenSearch Serverless, Aurora pgvector, and partner engines — targeting RAG and semantic search where storage cost dominates OpenSearch OCU-hours or dedicated vector DB pods.

As of June 16, 2026, QueryVectors returns up to 10,000 similarity search results per query (100× the prior 100-result limit), with paginated responses via nextToken. Query data-processed charges on indexes with more than 10 million vectors dropped up to 80% automatically. Large result sets may incur data-returned fees beyond the first 512 KB per query — see the S3 pricing page.

The trade-off is latency: expect roughly sub-100ms to low hundreds of ms query times suitable for batch retrieval, wide recall + rerank pipelines, and many chat RAG flows — not sub-10ms agent loops at thousands of QPS.

Store (illustrative)	Cost driver	Latency profile	Max topK (June 2026)
OpenSearch Serverless	OCU-hours + storage	Lower p99 on small indexes	Index-dependent
Dedicated vector SaaS	Pod/replica hours	Tunable, vendor-specific	Vendor-specific
S3 Vectors	Storage + per-query	Higher tail, lowest storage $	10,000 (paginated)

When to use it

Bedrock Knowledge Bases RAG with large corpora (10M+ chunks) where OpenSearch baseline OCUs inflate monthly cost — especially after the June 2026 large-index query discount.
Multi-stage retrieval — wide topK recall, client-side rerank, dedup by document_id — now practical without sharding workarounds for the old 100-result cap.
Multi-tenant SaaS needing S3-native isolation (prefix or bucket per tenant) with metadata filters at retrieval.
Archival or long-tail knowledge sets queried occasionally but stored durably for compliance.

When not to use it

Agentic workflows requiring sub-50ms retrieval inside tight tool-call loops at high QPS — OpenSearch Serverless or in-memory caches win.
Defaulting to topK=10,000 for simple chat RAG — five chunks to the LLM does not need wide recall; you pay latency and data-returned fees for no gain.
Hybrid lexical + vector search as a single managed engine — OpenSearch hybrid or Kendra may fit better.
Graph-heavy relationship traversal — Neptune Analytics combines graph and vector where edges matter.

Tips

Design metadata fields for mandatory filters (tenant, ACL, doc version) before first ingest — re-indexing billion-vector buckets is painful.
On wide recall passes, set returnMetadata=True and returnData=False; fetch chunk text only for post-rerank top-N.
Paginate QueryVectors with nextToken — process the first page while fetching the next; do not buffer thousands of payloads in Lambda memory.
Upgrade AWS SDKs after June 16, 2026 for pagination support on QueryVectors.
Run recall@k benchmarks before raising topK; cheapest store is worthless if reranked quality does not improve.

Gotchas

Serious: Raising topK to thousands with returnData=True without pagination — OOM in Lambda and unexpected data-returned charges past the 512 KB free tier.
Serious: Using S3 Vectors for real-time agent tool retrieval without load testing — tail latency spikes under concurrent sessions frustrate users.
Serious: Stale embeddings when source documents change but sync jobs fail silently — pair with document version metadata and health alarms on sync lag.
Regular: Assuming hybrid keyword search exists natively — you may still need OpenSearch or Athena on structured fields for keyword-heavy queries.
Regular: Cross-region inference in Bedrock reading vectors in another region adds data transfer — colocate vector buckets with Knowledge Base and model region.

Official references

Querying vectors — QueryVectors, filters, recall testing.
Create a vector index — index types and limits.
Knowledge Bases data source sync — supported ingestion paths.

Related Services

Amazon Bedrock Consulting for Production LLM Applications

Amazon Bedrock implementation consulting — Knowledge Bases, Agents, Guardrails, model routing, and production RAG. Hands-on Bedrock engineering, not GenAI strategy.

Learn more

Generative AI on AWS — Production-Ready LLM Apps in Weeks

Generative AI strategy and delivery on AWS — use-case selection, Bedrock + SageMaker architecture, governance, evaluations, and production rollout across the AWS AI stack.

Learn more

S3 Vectors: 10,000 Results per Query (June 2026)

On June 16, 2026, S3 Vectors raised the QueryVectors limit to 10,000 results per query and cut data-processed charges up to 80% on indexes over 10M vectors. Architecture, pagination, and cost comparison vs OpenSearch and MemoryDB.

Learn more

Need help with this topic?

Our AWS-certified team implements, audits, and optimizes these services in production — from Bedrock RAG pipelines to multi-account landing zones.

Talk to AWS Experts

Amazon S3 Vectors

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

Definition

When to use it

When not to use it

Tips

Gotchas

Official references

Related FactualMinds content

Related Services

Amazon Bedrock Consulting for Production LLM Applications

Generative AI on AWS — Production-Ready LLM Apps in Weeks

Related Articles

S3 Vectors: 10,000 Results per Query (June 2026)

Need help with this topic?