---
title: How to Build Multi-Tenant GenAI on AWS Bedrock
description: Build SaaS with AI: multi-tenant architecture on Bedrock, cost isolation, and tenant data security.
url: https://www.factualminds.com/blog/multi-tenant-genai-bedrock/
datePublished: 2026-04-07T00:00:00.000Z
dateModified: 2026-04-07T00:00:00.000Z
author: palaniappan-p
category: genai
tags: bedrock, saas, multi-tenancy, generative-ai
---

# How to Build Multi-Tenant GenAI on AWS Bedrock

> Build SaaS with AI: multi-tenant architecture on Bedrock, cost isolation, and tenant data security.

## Building Multi-Tenant GenAI SaaS on Bedrock

Most AI SaaS platforms use shared Bedrock account (pool model) with tenant isolation at the application layer. This guide covers architecture, cost tracking, and scaling considerations.

## Multi-Tenancy Models for AI

**Pool Model (Shared Bedrock)**

- One Bedrock account, many customers
- Cheapest (shared infrastructure)
- Requires app-level tenant isolation
- Best for: startups, SMB SaaS

**Silo Model (Dedicated Bedrock)**

- Separate Bedrock account per customer
- Highest isolation (compliance-sensitive)
- Most expensive (~$73/month per customer for control plane)
- Best for: enterprise SaaS ($10K+/month customers)

**Bridge Model (Hybrid)**

- Free/standard customers: pool
- Enterprise customers: silo
- Supports multiple tiers
- Best for: scaling SaaS with mixed customers

## Architecture: Multi-Tenant AI on Bedrock

```
Customer A Request
    ↓ (tenant_id=cust_a)
API Gateway
    ↓
Lambda (include tenant_id in prompt)
    ↓
Bedrock (same account, multiple tenants)
    ↓
Vector DB (RAG, filter by tenant_id)
    ↓
Response (tagged with tenant_id, returned)
    ↓
Billing (track cost per tenant)
```

**Key Points:**

- Single Bedrock account
- Tenant isolation at app layer (every request includes tenant_id)
- Vector DB queries filtered by tenant_id
- Cost tracking via tags/metrics

## Tenant Isolation Implementation

**1. Include Tenant Context in Prompt**

```python
def bedrock_prompt(customer_id, user_question, documents):
    # Include tenant context to prevent crosstalk
    system_prompt = f"""
    You are an AI assistant for customer {customer_id}.
    You have access only to this customer's documents.

    Documents for {customer_id}:
    {documents}

    Rules:
    - Do not reference other customers' data
    - Do not share this customer's data with other customers
    - Always cite which document you're referencing
    """

    return {
        'system': system_prompt,
        'messages': [{'role': 'user', 'content': user_question}]
    }
```

**2. Filter Vector DB by Tenant**

```python
# RAG embedding retrieval
vector_db.search(
    query=user_question,
    filters={'tenant_id': customer_id},  # Only their docs
    top_k=5
)
```

**3. Encrypt Data per Tenant**

```python
# Store embeddings with tenant isolation
vector_db.store(
    embedding=embedding_vector,
    document=document_text,
    tenant_id=customer_id,  # Queryable filter
    encrypted=True  # KMS encryption key per tenant
)
```

## Cost Tracking Per Customer

**1. Tag Bedrock Calls**

```python
bedrock = boto3.client('bedrock-runtime')

response = bedrock.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
    body=json.dumps({...}),
    # Tags for cost tracking
    'x-amzn-tagresource': [{
        'Key': 'Customer',
        'Value': customer_id
    }]
)
```

**2. Track Tokens (Billing Calculation)**

```python
# Bedrock returns token counts
def track_usage(customer_id, response):
    tokens_used = response['usage']['input_tokens'] + response['usage']['output_tokens']

    # Bedrock pricing ~$0.003 per 1K tokens (example)
    bedrock_cost = (tokens_used / 1000) * 0.003

    # Store in DynamoDB for billing
    usage_table.put_item(Item={
        'customer_id': customer_id,
        'timestamp': datetime.now().isoformat(),
        'tokens': tokens_used,
        'cost': bedrock_cost
    })

    return bedrock_cost
```

**3. Generate Customer Invoice**

```python
def calculate_customer_bill(customer_id, period):
    usage = usage_table.query(
        KeyConditionExpression='customer_id = :cid',
        ExpressionAttributeValues={':cid': customer_id}
    )

    total_tokens = sum(item['tokens'] for item in usage['Items'])
    bedrock_cost = (total_tokens / 1000) * 0.003

    # Add markup for profit/ops (3-5x typical)
    customer_price = bedrock_cost * 4  # 4x markup

    return {
        'bedrock_cost': bedrock_cost,
        'customer_charge': customer_price,
        'margin': customer_price - bedrock_cost
    }
```

## Rate Limiting Per Customer

```python
def check_rate_limit(customer_id):
    # Get customer's tier
    tier = get_customer_tier(customer_id)  # free, pro, enterprise

    limits = {
        'free': {'requests_per_day': 100},
        'pro': {'requests_per_day': 10000},
        'enterprise': {'requests_per_day': None}  # unlimited
    }

    daily_limit = limits[tier]['requests_per_day']

    # Check usage today
    today = datetime.now().date()
    usage_today = usage_table.query(
        KeyConditionExpression='customer_id = :cid AND starts_with(#ts, :date)',
        ExpressionAttributeNames={'#ts': 'timestamp'},
        ExpressionAttributeValues={
            ':cid': customer_id,
            ':date': str(today)
        }
    )

    if len(usage_today) >= daily_limit:
        raise Exception(f'Rate limit exceeded for {customer_id}')
```

## Scaling Considerations

**Per Customer Concurrency**

- Each customer can call Bedrock concurrently
- Bedrock has regional rate limits (can burst)
- For 1,000+ concurrent customers: use SQS queue (async processing)

**Vector DB Scaling**

- For 1,000 customers × 10,000 docs each: 10M embeddings
- Use Pinecone, Weaviate, or OpenSearch with partition by tenant_id
- Ensure retrieval latency stays < 1 second

**Cost Growth**

- As customers use more: Bedrock costs scale linearly
- Typical SaaS margin: 2-5x markup (customer pays 4x what you pay for Bedrock)
- For profitable SaaS: ensure customer LTV > acquisition cost

## Example Economics: AI SaaS with Bedrock

**10 Customers, 100 queries/month each**

```
Total queries: 1,000
Avg tokens per query: 500 (input) + 500 (output) = 1,000 tokens
Total tokens: 1M
Bedrock cost: 1M / 1000 × $0.003 = $3

Other costs:
- Vector DB: $20
- Lambda: $5
- API Gateway: $3

Monthly cost: $31
Revenue (assuming $50/customer): $500
Margin: $469 (94% margin!)
```

**100 Customers**

```
Bedrock cost: $30
Other infrastructure: $50
Total cost: $80
Revenue: $5,000
Margin: $4,920 (98% margin!)
```

## When to Move to Silo Model

As customer grows:

- Single customer > $5K/month: consider dedicated Bedrock
- Compliance requirements (HIPAA): maybe silo needed
- Negotiate separate account, negotiate AWS discount

## Best Practices

**Tenant Isolation**

- Always include tenant_id in queries/filters
- Never return another tenant's data
- Test with multiple customers; verify isolation

**Cost Control**

- Set per-customer token budgets
- Alert on unusual usage
- Implement rate limiting per tier

**Monitoring**

- CloudWatch metrics by customer
- Track latency per customer
- Monitor Bedrock availability

## Bottom Line

Pool model (shared Bedrock) is economical for most SaaS. Include tenant context in prompts, filter vector DB by tenant_id, track costs per customer. As customers grow, eventually move to silo (dedicated account) but most SaaS stays on pool model.

## FAQ

### How do I isolate customer data in multi-tenant Bedrock?
Include tenant ID in every prompt. Bedrock doesn't store data; each request is independent. But RAG vector DB needs tenant isolation: store embeddings with tenant_id, retrieve only matching tenant. Encryption at rest per tenant.

### How do I track costs per customer with Bedrock?
Tag Bedrock API calls with cost center / customer ID. CloudWatch captures tagged metrics. Use Cost Explorer to group by tag. Lambda wraps Bedrock calls: increment usage counter, charge customer. Real-time cost tracking per customer.

### Can I rate-limit per customer on Bedrock?
Bedrock doesn't have per-customer rate limits. Build in Lambda: maintain counter per customer, reject if over limit. Or use API Gateway throttling per API key (one key per customer).

### How much does multi-tenant AI SaaS cost per customer?
Bedrock: ~$0.005-0.015 per customer query (depends on model, token count). For 1,000 customers × 10 queries/day: ~$1,500/month Bedrock cost. Plus infrastructure. Typical SaaS adds 3-5x markup for profit/ops.

### What's the best multi-tenancy model for AI SaaS?
For GenAI: use pool model (shared Bedrock account, tenant isolation at app level). Silo model (separate accounts per customer) too expensive. Only silo if customer is enterprise paying $10K+/month for dedicated resources.

---

*Source: https://www.factualminds.com/blog/multi-tenant-genai-bedrock/*
