---
title: RAG Pipeline
description: Retrieval-Augmented Generation: combining document retrieval with AI models to answer questions based on specific data.
url: https://www.factualminds.com/glossary/rag-pipeline/
publishDate: 2026-06-13
updateDate: 2026-06-13
---

# RAG Pipeline

> Retrieval-Augmented Generation: combining document retrieval with AI models to answer questions based on specific data.

## Definition

A **RAG (Retrieval-Augmented Generation) pipeline** grounds large language model responses in your private documents instead of model weights alone. The flow: ingest documents → chunk text → embed chunks into vectors → store in a vector index → at query time retrieve the most relevant chunks → pass them with the user question to an LLM (Claude Sonnet 4.6, Nova Lite, Llama, etc.) → return an answer with citations. RAG reduces hallucination on proprietary facts and lets you update knowledge by changing documents, not retraining models. On AWS, **Amazon Bedrock Knowledge Bases** is the managed implementation; custom pipelines use S3, embedding models, and stores such as OpenSearch, Aurora pgvector, or Amazon S3 Vectors.

## When to use it

- **Q&A over internal docs** — policies, runbooks, contracts, product specs, support articles.
- **Knowledge that changes frequently** — re-sync embeddings when S3 sources update instead of fine-tuning on every edit.
- **Citation requirements** — regulated or customer-facing answers that must show source passages.
- **Multi-model flexibility** — swap Claude, Nova, or Llama at generation time while keeping one retrieval index.
- **Starting GenAI without labeled fine-tuning data** — RAG works with existing document corpora.

## When not to use it

- **Teaching the model a new skill or tone** that retrieval cannot supply — consider fine-tuning or prompt engineering instead (see [Fine-Tuning vs RAG](/blog/fine-tuning-vs-rag-bedrock-when-to-use/)).
- **Garbage document corpora** — scanned PDFs with OCR errors, duplicate wikis, and outdated runbooks produce confident wrong answers.
- **Sub-100ms latency requirements** — retrieval plus generation adds hundreds of milliseconds to seconds; cache or precompute where needed.

## Tips

- Chunk at **300–500 tokens** with **10–20% overlap**; whole-document embeddings dilute relevance signals.
- Use **metadata filters** (department, product, date) on Knowledge Bases to narrow retrieval in multi-tenant apps.
- Prefer **Bedrock Knowledge Bases** for new projects — it handles sync, chunking, and embedding unless you need custom reranking logic.
- Evaluate **S3 Vectors vs OpenSearch** on cost and hybrid search needs — keyword + semantic hybrid often beats pure vector for acronyms and SKUs.
- Re-embed when you change embedding models — vectors are not portable across model versions.

## Gotchas

### Serious

- **No access control on the vector index** — if the index commingles tenants, retrieval leaks cross-customer context into prompts.
- **Trusting answers without checking citations** — wrong-chunk retrieval looks authoritative; always surface sources in the UI for high-stakes use cases.

### Regular

- **Huge PDFs embedded whole** — retrieval returns irrelevant sections; chunk and structure documents first.
- **Stale sync** — custom pipelines forget webhooks on S3 upload; Knowledge Bases auto-sync is easier but still needs monitoring.
- **Default OpenSearch for every workload** — S3 Vectors covers many RAG indexes at lower operational overhead when hybrid search is not required.

## Official references

- [Retrieve data and generate responses with Amazon Bedrock Knowledge Bases](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html)
- [How Knowledge Bases work](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-how-it-works.html)
- [Vector stores for Knowledge Bases](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)
- [Amazon S3 Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors.html)

## Related FactualMinds content

- [Fine-Tuning vs RAG on AWS Bedrock: When to Use Each](/blog/fine-tuning-vs-rag-bedrock-when-to-use/)
- [Generative AI on AWS Bedrock for Enterprises](/services/generative-ai-on-aws/)
- [Amazon Bedrock](/glossary/amazon-bedrock/)

## Related AWS Services

- generative-ai-on-aws
- aws-bedrock

## Related Posts

- fine-tuning-vs-rag-bedrock-when-to-use

---

*Source: https://www.factualminds.com/glossary/rag-pipeline/*