Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models. This guide covers setup, data ingestion, cost optimization, and production patterns.

Key Facts

  • Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models
  • AWS lifecycle notice (June 30, 2026) — Amazon Bedrock Agents Classic is now Bedrock Agents Classic, in maintenance for new customers after July 30, 2026
  • Net-new agent builds should use Bedrock AgentCore
  • Amazon Bedrock Knowledge Bases simplify building Retrieval-Augmented Generation (RAG) applications
  • Instead of orchestrating chunking, embedding, vector search, and context injection manually, Bedrock handles the pipeline end-to-end

Entity Definitions

AWS Bedrock
AWS Bedrock is an AWS service discussed in this article.
Amazon Bedrock
Amazon Bedrock is an AWS service discussed in this article.
Bedrock
Bedrock is an AWS service discussed in this article.
S3
S3 is an AWS service discussed in this article.
Amazon S3
Amazon S3 is an AWS service discussed in this article.
Aurora
Aurora is an AWS service discussed in this article.
CloudWatch
CloudWatch is an AWS service discussed in this article.
OpenSearch
OpenSearch is an AWS service discussed in this article.

How to Build a RAG Pipeline with Amazon Bedrock Knowledge Bases

Generative AIPalaniappan P7 min read

Quick summary: Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models. This guide covers setup, data ingestion, cost optimization, and production patterns.

Key Takeaways

  • Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models
  • AWS lifecycle notice (June 30, 2026) — Amazon Bedrock Agents Classic is now Bedrock Agents Classic, in maintenance for new customers after July 30, 2026
  • Net-new agent builds should use Bedrock AgentCore
  • Amazon Bedrock Knowledge Bases simplify building Retrieval-Augmented Generation (RAG) applications
  • Instead of orchestrating chunking, embedding, vector search, and context injection manually, Bedrock handles the pipeline end-to-end
How to Build a RAG Pipeline with Amazon Bedrock Knowledge Bases
Table of Contents

AWS lifecycle notice (June 30, 2026) — Amazon Bedrock Agents Classic is now Bedrock Agents Classic, in maintenance for new customers after July 30, 2026. Net-new agent builds should use Bedrock AgentCore. Full matrix: lifecycle roundup.

Amazon Bedrock Knowledge Bases simplify building Retrieval-Augmented Generation (RAG) applications. Instead of orchestrating chunking, embedding, vector search, and context injection manually, Bedrock handles the pipeline end-to-end. Upload documents → Bedrock chunks and embeds them → retrieval happens automatically when you invoke Claude or other foundation models with the knowledge base.

This guide covers the full setup: creating and configuring a knowledge base, ingesting documents, querying it from your application, and optimizing for production cost and latency.

Building GenAI on AWS? FactualMinds helps teams architect and deploy Bedrock applications at scale. See our AWS Bedrock consulting services or talk to our team.

Step 1: Create an Amazon Bedrock Knowledge Base

Start in the AWS Console:

  1. Navigate to Amazon BedrockKnowledge Base
  2. Click Create Knowledge Base
  3. Name: my-company-kb (lowercase, no spaces)
  4. Description: Optional but recommended for tracking
  5. Select an embedding model: Choose from:
    • Titan Embeddings (default, $0.08 per 1M tokens, fully managed)
    • Custom embeddings (requires external OpenSearch or vector database)

For most use cases, Titan Embeddings is sufficient. It understands semantic relationships across business documents, code, and technical content.

  1. Click Next

Step 2: Configure Data Source and Vector Store

Bedrock needs two things: a data source (where documents live) and a vector store (where embeddings are stored).

Step 2A: Data Source

Choose where your documents are stored:

  • S3 bucket (recommended for bulk ingestion)

    • Create an S3 bucket or select an existing one: s3://my-company-docs/
    • Bedrock will scan for supported files: PDF, DOCX, TXT, MD, HTML
    • Set an S3 prefix if documents are in a subdirectory: bedrock-documents/
  • Web crawler (less common, for public websites)

    • Useful for ingesting documentation sites, but adds complexity

For this guide, use S3 bucket.

Step 2B: Vector Store

Choose a vector store when creating the knowledge base:

  • Amazon S3 Vectors (recommended for new RAG workloads in 2026) — S3-native vector storage at lower cost for large corpora; see S3 Vectors guide.
  • OpenSearch Serverless — Bedrock can auto-create a collection named bedrock-kb-{timestamp}; use when you need hybrid BM25 + vector search or sub-50ms retrieval.
  • Aurora PostgreSQL (pgvector) — when you already operate Aurora and want SQL-adjacent ops.

For OpenSearch Serverless, you are charged for OCU-hours plus storage (historically ~$108/month minimum at small scale). S3 Vectors targets storage-economics workloads above ~10M vectors. Pick based on latency and hybrid-search requirements — not defaulting to OpenSearch without a reason.

Click Create and Ingest to proceed.

Step 3: Ingest Documents into the Knowledge Base

Once the knowledge base is created, upload documents:

  1. In the Knowledge Base page, select your KB and go to Documents

  2. Upload documents:

    • Drag-and-drop files or click to browse
    • Supported formats: PDF, DOCX, TXT, MD, HTML, JSON, CSV (as semi-structured)
    • Max file size: 50MB per document
  3. Chunking strategy (default is fine for most cases):

    • Chunk size: 2,048 tokens (≈ 1,500 words)
    • Overlap: 512 tokens (improves semantic coherence at chunk boundaries)
    • These settings are NOT customizable in the console; to use custom chunking, use the API
  4. Metadata filtering (optional):

    • Add document metadata like department: engineering, year: 2026
    • Metadata can be used in retrieval queries to filter results
  5. Click Upload and ingest

Expected duration: Ingestion takes ~1 minute per 100MB of documents. A 1GB knowledge base takes ~10 minutes.

Step 4: Query the Knowledge Base from Your Application

Bedrock knowledge bases integrate with the Agents API and regular model invocation. Here’s how to retrieve documents and pass them to Claude:

Using the Agents API (recommended for production):

import boto3
import json

bedrock_agent_client = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
kb_id = 'your-knowledge-base-id'

response = bedrock_agent_client.retrieve_and_generate(
    input={'text': 'How does our cost optimization framework work?'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'HYBRID'  # Combines vector + keyword search
        }
    },
    sessionConfiguration={
        'kmsKeyArn': 'arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012'  # Optional encryption
    },
    knowledgeBaseId=kb_id,
    modelId='anthropic.claude-sonnet-4-6'  # Pin after benchmark — Sonnet 5 (2026-06-30) for agentic lanes; verify in console
)

print(response['output']['text'])

This does three things:

  1. Retrieves the top 5 most semantically similar documents from the KB
  2. Generates a response using Claude Sonnet 4.6 (or Sonnet 5 after benchmark) with those documents as context
  3. Returns a single text response

Using the Bedrock Runtime API (for custom control):

If you need more control over the prompt or want to inject context into your own LLM calls:

bedrock_runtime_client = boto3.client('bedrock-runtime', region_name='us-east-1')
bedrock_agent_client = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

# Step 1: Retrieve documents from KB
retrieval_results = bedrock_agent_client.retrieve(
    knowledgeBaseId=kb_id,
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'HYBRID'
        }
    },
    text='How does our cost optimization framework work?'
)

# Step 2: Format documents as context
context = '\n\n'.join([
    f"Source: {item['metadata'].get('source', 'Unknown')}\n{item['content']['text']}"
    for item in retrieval_results['retrievalResults']
])

# Step 3: Invoke Claude with context
response = bedrock_runtime_client.invoke_model(
    modelId='anthropic.claude-sonnet-4-6',  # Pin after benchmark — Sonnet 5 (2026-06-30) for agentic lanes; verify in console
    body=json.dumps({
        'anthropic_version': 'bedrock-2023-06-01',
        'max_tokens': 2048,
        'messages': [
            {
                'role': 'user',
                'content': f"""You are a helpful assistant with access to company documentation.

Here is the relevant documentation:

<company_docs>
{context}
</company_docs>

User question: How does our cost optimization framework work?"""
            }
        ]
    })
)

print(json.loads(response['body'].read())['content'][0]['text'])

The second approach gives you full control over the prompt but requires you to handle chunking and deduplication yourself.

Step 5: Optimize Cost and Performance

Chunk size tuning:

  • Small chunks (500 tokens): Fast retrieval, better precision on specific Q&A (e.g., “What’s the pricing for service X?”), but requires more API calls if you retrieve multiple chunks
  • Large chunks (4,000+ tokens): Slower retrieval, better for narrative documents (e.g., whitepapers, architectural guides), fewer chunks needed
  • Default (2,048 tokens) is a good balance. Use the API to customize:
bedrock_agent_client.create_knowledge_base_data_source(
    knowledgeBaseId=kb_id,
    dataSourceConfiguration={
        'type': 'S3',
        's3BucketConfiguration': {
            'bucketArn': 'arn:aws:s3:::my-company-docs',
            'inclusionPrefixes': ['bedrock/'],
            'parsingConfiguration': {
                'parsingStrategy': 'BEDROCK_NATIVE',
                'bedrockNativeConfiguration': {
                    'parseDocumentPayloadFlag': True,
                    'parseHtmlTagConfiguration': {
                        'htmlTagsToExclude': ['script', 'style']
                    }
                }
            }
        }
    }
)

Filtering by metadata:

Use metadata to reduce the number of results returned (faster, cheaper):

response = bedrock_agent_client.retrieve_and_generate(
    input={'text': 'How does our cost optimization framework work?'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'filter': {
                'equals': [
                    {'key': 'department', 'value': 'engineering'},
                    {'key': 'year', 'value': '2026'}
                ]
            }
        }
    },
    knowledgeBaseId=kb_id,
    modelId='anthropic.claude-sonnet-4-6'  # Pin after benchmark — Sonnet 5 (2026-06-30) for agentic lanes; verify in console
)

Monitoring and cost:

  • Knowledge Base provisioning: $0.15/hour = $108/month (non-negotiable minimum)
  • Retrieval (GET): $0.15 per 1,000 calls
  • Ingestion (PUT): $0.15 per 1,000 uploads
  • Embeddings: $0.08 per 1M tokens (only if using Bedrock embeddings)

At 1,000 queries/day: $108 + ~$45/month in retrieval costs. At 10,000+ queries/day, consider OpenSearch Serverless with Titan Embeddings for potentially lower costs.

Step 6: Production-Ready Patterns

Hybrid Search (Vector + Keyword):

The HYBRID search type combines semantic similarity (vectors) with keyword matching, improving precision:

'overrideSearchType': 'HYBRID'  # Recommended for production

Reranking with Claude:

For high-stakes applications (e.g., customer support), retrieve more documents (e.g., 20) and have Claude rerank them:

# Retrieve 20 results, then ask Claude to pick the best 3
response = bedrock_agent_client.retrieve_and_generate(
    input={'text': 'How does our cost optimization framework work?'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 20
        }
    },
    knowledgeBaseId=kb_id,
    modelId='anthropic.claude-sonnet-4-6'  # Pin after benchmark — Sonnet 5 (2026-06-30) for agentic lanes; verify in console
)

Caching for repeated queries:

If you ask the same questions repeatedly, use Bedrock’s prompt caching to avoid re-retrieving and re-embedding:

response = bedrock_runtime_client.invoke_model(
    modelId='anthropic.claude-sonnet-4-6',  # Pin after benchmark — Sonnet 5 (2026-06-30) for agentic lanes; verify in console
    body=json.dumps({
        'system': [
            {
                'type': 'text',
                'text': 'You are a helpful assistant with access to company documentation.'
            },
            {
                'type': 'text',
                'text': context,
                'cache_control': {'type': 'ephemeral'}
            }
        ],
        'max_tokens': 2048,
        'messages': [{'role': 'user', 'content': 'How does our cost optimization framework work?'}]
    })
)

This caches the context for 5 minutes, reducing costs on repeated queries by 90%.

Common Mistakes to Avoid

  1. Uploading unstructured data without preprocessing

    • PDFs with images: Bedrock’s native parser may miss image content. Pre-OCR large documents.
    • Tables: Convert to markdown or JSON for better chunk coherence.
  2. Using too many documents

    • Knowledge bases with 100,000+ documents can hit retrieval latency. Consider partitioning into multiple KBs by domain.
  3. Not setting metadata

    • Without metadata, every query retrieves from the entire KB. Add department, product, or date metadata to narrow results.
  4. Ignoring the $108/month minimum

    • The knowledge base collection runs 24/7. If you don’t query it frequently, disable it or delete it.

Next Steps

  1. Ingest your first batch of documents
  2. Test retrieval with sample queries
  3. Monitor costs in the AWS Bedrock console
  4. Deploy to production with error handling and monitoring (e.g., CloudWatch alarms for failed retrievals)
  5. Talk to FactualMinds if you need help scaling to production volume or custom embedding models

For multi-tenant Bedrock isolation patterns, read multi-tenant GenAI on Bedrock.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Recommended Reading

Explore All Articles »