AWS Glossary
RAG Pipeline
Retrieval-Augmented Generation: combining document retrieval with AI models to answer questions based on specific data.
AI & assistant-friendly summary
This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.
Summary
Retrieval-Augmented Generation: combining document retrieval with AI models to answer questions based on specific data.
Key Facts
- • Build a custom pipeline only when you need chunking strategies or retrieval logic that Bedrock doesn't support
- • Common Mistakes **Mistake 1:** Using RAG on unstructured, low-quality documents
- • Mistake 2:** Embedding entire documents at once instead of chunking
- • Large documents dilute the relevance signal; use 300–500 word chunks with 10–20% overlap
- • Mistake 3:** Not re-embedding when documents change
Entity Definitions
- AWS Bedrock
- AWS Bedrock is an AWS service relevant to rag pipeline.
- Amazon Bedrock
- Amazon Bedrock is an AWS service relevant to rag pipeline.
- Bedrock
- Bedrock is an AWS service relevant to rag pipeline.
- S3
- S3 is an AWS service relevant to rag pipeline.
- Amazon S3
- Amazon S3 is an AWS service relevant to rag pipeline.
- Aurora
- Aurora is an AWS service relevant to rag pipeline.
- Amazon Aurora
- Amazon Aurora is an AWS service relevant to rag pipeline.
- OpenSearch
- OpenSearch is an AWS service relevant to rag pipeline.
- Amazon OpenSearch
- Amazon OpenSearch is an AWS service relevant to rag pipeline.
- RAG
- RAG is a cloud computing concept relevant to rag pipeline.
- fine-tuning
- fine-tuning is a cloud computing concept relevant to rag pipeline.
- serverless
- serverless is a cloud computing concept relevant to rag pipeline.
Related Content
- GENERATIVE AI ON AWS — Related service
- AWS BEDROCK — Related service
Definition
A RAG (Retrieval-Augmented Generation) Pipeline combines document retrieval with large language models to ground AI responses in specific data. Instead of relying solely on model training data, RAG retrieves relevant documents from a knowledge base and uses those documents to answer questions. This prevents hallucinations (AI making up false facts) and keeps responses grounded in your proprietary data.
How RAG Works on AWS
Step 1: Document Ingestion
- Upload documents (PDFs, text, HTML, Word, CSV) to S3
- Use Amazon Textract to extract text from documents and scanned images
- Split documents into chunks (300-500 word sections)
Step 2: Embedding & Storage
- Convert document chunks to embeddings (numerical vectors) using Amazon Bedrock embedding models (Titan Embeddings v2, Cohere Embed)
- Store embeddings in a vector store — options in 2025/2026:
- Amazon S3 Vectors (new 2025): serverless vector store, up to 2 billion vectors, ~90% cheaper than specialized databases, no infrastructure to manage
- Amazon OpenSearch with Vector Search: scales to millions of chunks, supports hybrid keyword + vector search
- Amazon Aurora PostgreSQL with pgvector: for teams already using Aurora
- Third-party: Pinecone, Weaviate
Step 3: Query & Retrieval
- User asks a question
- Convert question to embedding using the same model as Step 2
- Retrieve top-k most similar documents from vector store
Step 4: Generation
- Pass retrieved documents + user question to a Bedrock model (Claude, Nova, Llama, etc.)
- Model generates an answer grounded in retrieved documents
- Response includes citations/references to source documents
Managed RAG on AWS: Bedrock Knowledge Bases
Amazon Bedrock Knowledge Bases (GA since 2024) is the fully managed RAG solution on AWS — it handles Steps 1–3 automatically:
- Connect S3 data sources; Bedrock handles chunking, embedding, and sync
- Choose vector store: S3 Vectors, OpenSearch Serverless, Aurora, Pinecone, MongoDB Atlas
- Query via API (
retrieveorretrieveAndGenerate) — no pipeline code to write - Automatic re-sync when documents are updated in S3
- Supports metadata filtering for precise retrieval
Use Bedrock Knowledge Bases for new RAG projects. Build a custom pipeline only when you need chunking strategies or retrieval logic that Bedrock doesn’t support.
Common Mistakes
Mistake 1: Using RAG on unstructured, low-quality documents. If your knowledge base is poor, RAG outputs will be poor.
Mistake 2: Embedding entire documents at once instead of chunking. Large documents dilute the relevance signal; use 300–500 word chunks with 10–20% overlap.
Mistake 3: Not re-embedding when documents change. Stale embeddings retrieve wrong documents. Bedrock Knowledge Bases handles automatic re-sync; custom pipelines must implement it manually.
Mistake 4: Defaulting to OpenSearch for all use cases. Amazon S3 Vectors is significantly cheaper and sufficient for most RAG workloads under 2 billion vectors.
Related AWS Services
- Amazon Bedrock Knowledge Bases: Fully managed RAG — recommended starting point
- Amazon S3 Vectors: Serverless vector store (new 2025), cheapest option for most use cases
- Amazon OpenSearch with Vector Search: Best for hybrid keyword + semantic search
- Amazon Textract: Extracts text from PDFs and scanned documents
- Amazon S3: Storage for document corpus
Related FactualMinds Content
Need Help with This Topic?
Our AWS experts can help you implement and optimize these concepts for your organization.