Dilip Singh logo
All posts
SaaS ArchitectureIntermediate2026-06-15·17 min read

Building AI-Powered SaaS Products: From Architecture to First Revenue

Practical guide to building AI SaaS products — multi-tenancy patterns, BYOK (bring your own key), billing integration, feature flags, and deployment strategies with lessons from building Hureka AI.

The AI SaaS Opportunity in 2026

The AI SaaS market is exploding, but most AI startups are still building on architectures that will not scale past their first 100 customers. I learned this the hard way while building Hureka AI — a multi-agent AI platform serving healthcare clinics. The first architecture lasted three months before we had to rewrite the entire tenant isolation layer.

This guide is the architecture playbook I wish I had when starting. It covers the decisions that matter most: multi-tenancy, API key management, billing, and deployment — with real code patterns you can adapt.

Multi-Tenancy Patterns for AI SaaS

Multi-tenancy in AI SaaS is harder than traditional SaaS because you are managing not just data isolation, but also model state, vector databases, conversation history, and API key budgets.

The Three Isolation Models

ModelData IsolationCostComplexityBest For
Shared everythingTenant ID filtersLowLowMVP, <100 tenants
Shared infra, isolated dataSeparate DB schemas/collectionsMediumMediumGrowth stage, <1000 tenants
Fully isolatedDedicated instancesHighHighEnterprise, compliance-heavy

For most AI SaaS products, shared infrastructure with isolated data is the sweet spot:

python
from fastapi import Depends, HTTPException
from sqlalchemy.orm import Session

class TenantContext: def __init__(self, tenant_id: str, plan: str, api_keys: dict): self.tenant_id = tenant_id self.plan = plan self.api_keys = api_keys

async def get_tenant( request: Request, db: Session = Depends(get_db) ) -> TenantContext: """Extract and validate tenant from JWT token.""" token = request.headers.get("Authorization", "").replace("Bearer ", "") payload = verify_jwt(token) tenant = db.query(Tenant).filter(Tenant.id == payload["tenant_id"]).first()

if not tenant or not tenant.is_active: raise HTTPException(status_code=403, detail="Tenant not found or inactive")

return TenantContext( tenant_id=tenant.id, plan=tenant.plan, api_keys=decrypt_api_keys(tenant.encrypted_keys), ) ```

Vector Database Isolation

For RAG-powered SaaS, each tenant needs isolated vector storage. With Qdrant, you have two options:

Option A: Separate collections per tenant (simpler, better isolation)

python
async def get_tenant_collection(tenant_id: str) -> str:
    collection_name = f"tenant_{tenant_id}"
    collections = await qdrant_client.get_collections()
    existing = [c.name for c in collections.collections]

if collection_name not in existing: await qdrant_client.create_collection( collection_name=collection_name, vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE), ) return collection_name ```

Option B: Shared collection with payload filtering (more efficient at scale)

python
async def search_tenant_docs(tenant_id: str, query: str, top_k: int = 5):
    return await qdrant_client.search(
        collection_name="shared_documents",
        query_vector=embed(query),
        query_filter=models.Filter(
            must=[models.FieldCondition(
                key="tenant_id",
                match=models.MatchValue(value=tenant_id)
            )]
        ),
        limit=top_k,
    )

We use Option A for enterprise tenants and Option B for self-serve tenants — a hybrid approach that balances isolation guarantees with operational efficiency.

BYOK: Bring Your Own Key

BYOK is increasingly expected by enterprise customers who want to use their own OpenAI/Anthropic API keys for cost transparency and data control.

Secure Key Storage

Never store API keys in plaintext. Use envelope encryption:

python
from cryptography.fernet import Fernet
import os

MASTER_KEY = os.environ["ENCRYPTION_MASTER_KEY"] fernet = Fernet(MASTER_KEY)

def encrypt_api_keys(keys: dict) -> str: """Encrypt tenant API keys for storage.""" serialized = json.dumps(keys) return fernet.encrypt(serialized.encode()).decode()

def decrypt_api_keys(encrypted: str) -> dict: """Decrypt tenant API keys for use.""" decrypted = fernet.decrypt(encrypted.encode()) return json.loads(decrypted.decode()) ```

Dynamic LLM Client Resolution

python
from openai import AsyncOpenAI
from anthropic import AsyncAnthropic

def get_llm_client(tenant: TenantContext, provider: str = "openai"): """Get an LLM client using the tenant's own API key or our default.""" if provider == "openai": api_key = tenant.api_keys.get("openai") or os.environ["DEFAULT_OPENAI_KEY"] return AsyncOpenAI(api_key=api_key) elif provider == "anthropic": api_key = tenant.api_keys.get("anthropic") or os.environ["DEFAULT_ANTHROPIC_KEY"] return AsyncAnthropic(api_key=api_key) else: raise ValueError(f"Unsupported provider: {provider}")

async def chat_completion(tenant: TenantContext, messages: list[dict], model: str = "gpt-4o"): client = get_llm_client(tenant, provider=detect_provider(model))

usage_before = await get_tenant_usage(tenant.tenant_id) if usage_before >= get_plan_limit(tenant.plan): raise UsageLimitExceeded(f"Tenant {tenant.tenant_id} exceeded plan limits")

response = await client.chat.completions.create(model=model, messages=messages) await track_usage(tenant.tenant_id, response.usage) return response ```

Billing Integration

AI SaaS billing is uniquely challenging because costs scale with usage (tokens, API calls, storage) rather than just seat count.

Hybrid Billing Model

The most successful AI SaaS products use a hybrid model: base subscription + usage-based overages.

python
PLANS = {
    "starter": {
        "base_price": 49,
        "included_tokens": 1_000_000,
        "included_storage_mb": 500,
        "overage_per_1k_tokens": 0.003,
        "max_agents": 3,
    },
    "professional": {
        "base_price": 199,
        "included_tokens": 10_000_000,
        "included_storage_mb": 5000,
        "overage_per_1k_tokens": 0.002,
        "max_agents": 15,
    },
    "enterprise": {
        "base_price": "custom",
        "included_tokens": "unlimited",
        "included_storage_mb": "unlimited",
        "overage_per_1k_tokens": 0,
        "max_agents": "unlimited",
    },
}

Usage Tracking Pipeline

python
import stripe
from datetime import datetime

async def track_and_bill_usage(tenant_id: str, usage: dict): """Track token usage and report to Stripe for metered billing.""" await redis.hincrby(f"usage:{tenant_id}:{current_month()}", "tokens", usage["total_tokens"]) await redis.hincrby(f"usage:{tenant_id}:{current_month()}", "requests", 1)

total_tokens = int(await redis.hget(f"usage:{tenant_id}:{current_month()}", "tokens") or 0) plan = await get_tenant_plan(tenant_id) included = PLANS[plan]["included_tokens"]

if total_tokens > included: overage = total_tokens - included stripe.SubscriptionItem.create_usage_record( subscription_item_id=await get_stripe_item(tenant_id), quantity=overage // 1000, timestamp=int(datetime.utcnow().timestamp()), action="set", ) ```

Feature Flags for AI Features

Feature flags are critical for AI SaaS because you are constantly experimenting with models, prompts, and features:

python
from posthog import Posthog

posthog = Posthog(project_api_key=POSTHOG_KEY, host=POSTHOG_HOST)

async def get_ai_config(tenant_id: str) -> dict: """Resolve feature flags and AI configuration for a tenant.""" flags = posthog.get_all_flags(tenant_id) return { "model": "claude-sonnet-4" if flags.get("use-claude") else "gpt-4o", "rag_enabled": flags.get("rag-v2", False), "streaming_enabled": flags.get("streaming-responses", True), "max_context_tokens": 8000 if flags.get("extended-context") else 4000, "reranking_enabled": flags.get("reranking", False), } ```

Deployment Strategies

Docker Compose for Early Stage

yaml
services:
  api:
    build: ./api
    environment:
      - DATABASE_URL=postgresql://user:pass@db:5432/saas
      - QDRANT_URL=http://qdrant:6333
      - REDIS_URL=redis://redis:6379
    ports:
      - "8000:8000"
    deploy:
      resources:
        limits:
          memory: 2G

worker: build: ./worker environment: - CELERY_BROKER_URL=redis://redis:6379 deploy: replicas: 3

qdrant: image: qdrant/qdrant:latest volumes: - qdrant_data:/qdrant/storage ports: - "6333:6333"

redis: image: redis:7-alpine volumes: - redis_data:/data

db: image: postgres:16-alpine volumes: - pg_data:/var/lib/postgresql/data environment: - POSTGRES_DB=saas ```

Kubernetes for Growth Stage

When you outgrow Docker Compose (typically around 50+ tenants with production SLAs), move to Kubernetes with separate deployments for API, workers, and AI inference:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-inference
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ai-inference
  template:
    spec:
      containers:
        - name: ollama
          image: ollama/ollama:latest
          resources:
            limits:
              nvidia.com/gpu: 1
              memory: "16Gi"
          volumeMounts:
            - name: model-cache
              mountPath: /root/.ollama
      nodeSelector:
        gpu: "true"

Lessons from Building Hureka AI

  1. 1Ship multi-tenancy on day one. Retrofitting tenant isolation is 10x harder than building it from the start.
  2. 2BYOK wins enterprise deals. It removed the biggest objection ("we cannot send data to your API keys") in three enterprise deals.
  3. 3Usage-based billing requires real-time tracking. Batch billing at month-end causes surprise charges and churn.
  4. 4Feature flags are not optional. We rolled back an AI model change in production via a flag in under 60 seconds.
  5. 5Start with Docker Compose. Kubernetes adds operational complexity that does not pay off until you have real scale.

Conclusion

Building an AI-powered SaaS product is as much about the SaaS architecture as it is about the AI. Multi-tenancy, billing, key management, and deployment are the foundation that determines whether your product can grow from 10 customers to 10,000.

The patterns in this guide are battle-tested from building Hureka AI. They are not the only way, but they represent a proven path from MVP to revenue.

If you are building an AI SaaS product and need help with architecture decisions, [schedule a consultation](/contact). We offer [hands-on architecture services](/services) specifically for AI-powered products — from initial design through production launch.

DS
Dilip Singh
Lead Software Architect · Hureka Technologies

14+ years building enterprise software and AI systems. Architecting multi-agent AI platforms, RAG pipelines, voice AI, and high-performance SaaS for global clients.