Building AI-Powered SaaS Products: From Architecture to First Revenue
Practical guide to building AI SaaS products — multi-tenancy patterns, BYOK (bring your own key), billing integration, feature flags, and deployment strategies with lessons from building Hureka AI.
The AI SaaS Opportunity in 2026
The AI SaaS market is exploding, but most AI startups are still building on architectures that will not scale past their first 100 customers. I learned this the hard way while building Hureka AI — a multi-agent AI platform serving healthcare clinics. The first architecture lasted three months before we had to rewrite the entire tenant isolation layer.
This guide is the architecture playbook I wish I had when starting. It covers the decisions that matter most: multi-tenancy, API key management, billing, and deployment — with real code patterns you can adapt.
Multi-Tenancy Patterns for AI SaaS
Multi-tenancy in AI SaaS is harder than traditional SaaS because you are managing not just data isolation, but also model state, vector databases, conversation history, and API key budgets.
The Three Isolation Models
| Model | Data Isolation | Cost | Complexity | Best For |
|---|---|---|---|---|
| Shared everything | Tenant ID filters | Low | Low | MVP, <100 tenants |
| Shared infra, isolated data | Separate DB schemas/collections | Medium | Medium | Growth stage, <1000 tenants |
| Fully isolated | Dedicated instances | High | High | Enterprise, compliance-heavy |
For most AI SaaS products, shared infrastructure with isolated data is the sweet spot:
from fastapi import Depends, HTTPException
from sqlalchemy.orm import Sessionclass TenantContext: def __init__(self, tenant_id: str, plan: str, api_keys: dict): self.tenant_id = tenant_id self.plan = plan self.api_keys = api_keys
async def get_tenant( request: Request, db: Session = Depends(get_db) ) -> TenantContext: """Extract and validate tenant from JWT token.""" token = request.headers.get("Authorization", "").replace("Bearer ", "") payload = verify_jwt(token) tenant = db.query(Tenant).filter(Tenant.id == payload["tenant_id"]).first()
if not tenant or not tenant.is_active: raise HTTPException(status_code=403, detail="Tenant not found or inactive")
return TenantContext( tenant_id=tenant.id, plan=tenant.plan, api_keys=decrypt_api_keys(tenant.encrypted_keys), ) ```
Vector Database Isolation
For RAG-powered SaaS, each tenant needs isolated vector storage. With Qdrant, you have two options:
Option A: Separate collections per tenant (simpler, better isolation)
async def get_tenant_collection(tenant_id: str) -> str:
collection_name = f"tenant_{tenant_id}"
collections = await qdrant_client.get_collections()
existing = [c.name for c in collections.collections]if collection_name not in existing: await qdrant_client.create_collection( collection_name=collection_name, vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE), ) return collection_name ```
Option B: Shared collection with payload filtering (more efficient at scale)
async def search_tenant_docs(tenant_id: str, query: str, top_k: int = 5):
return await qdrant_client.search(
collection_name="shared_documents",
query_vector=embed(query),
query_filter=models.Filter(
must=[models.FieldCondition(
key="tenant_id",
match=models.MatchValue(value=tenant_id)
)]
),
limit=top_k,
)
We use Option A for enterprise tenants and Option B for self-serve tenants — a hybrid approach that balances isolation guarantees with operational efficiency.
BYOK: Bring Your Own Key
BYOK is increasingly expected by enterprise customers who want to use their own OpenAI/Anthropic API keys for cost transparency and data control.
Secure Key Storage
Never store API keys in plaintext. Use envelope encryption:
from cryptography.fernet import Fernet
import osMASTER_KEY = os.environ["ENCRYPTION_MASTER_KEY"] fernet = Fernet(MASTER_KEY)
def encrypt_api_keys(keys: dict) -> str: """Encrypt tenant API keys for storage.""" serialized = json.dumps(keys) return fernet.encrypt(serialized.encode()).decode()
def decrypt_api_keys(encrypted: str) -> dict: """Decrypt tenant API keys for use.""" decrypted = fernet.decrypt(encrypted.encode()) return json.loads(decrypted.decode()) ```
Dynamic LLM Client Resolution
from openai import AsyncOpenAI
from anthropic import AsyncAnthropicdef get_llm_client(tenant: TenantContext, provider: str = "openai"): """Get an LLM client using the tenant's own API key or our default.""" if provider == "openai": api_key = tenant.api_keys.get("openai") or os.environ["DEFAULT_OPENAI_KEY"] return AsyncOpenAI(api_key=api_key) elif provider == "anthropic": api_key = tenant.api_keys.get("anthropic") or os.environ["DEFAULT_ANTHROPIC_KEY"] return AsyncAnthropic(api_key=api_key) else: raise ValueError(f"Unsupported provider: {provider}")
async def chat_completion(tenant: TenantContext, messages: list[dict], model: str = "gpt-4o"): client = get_llm_client(tenant, provider=detect_provider(model))
usage_before = await get_tenant_usage(tenant.tenant_id) if usage_before >= get_plan_limit(tenant.plan): raise UsageLimitExceeded(f"Tenant {tenant.tenant_id} exceeded plan limits")
response = await client.chat.completions.create(model=model, messages=messages) await track_usage(tenant.tenant_id, response.usage) return response ```
Billing Integration
AI SaaS billing is uniquely challenging because costs scale with usage (tokens, API calls, storage) rather than just seat count.
Hybrid Billing Model
The most successful AI SaaS products use a hybrid model: base subscription + usage-based overages.
PLANS = {
"starter": {
"base_price": 49,
"included_tokens": 1_000_000,
"included_storage_mb": 500,
"overage_per_1k_tokens": 0.003,
"max_agents": 3,
},
"professional": {
"base_price": 199,
"included_tokens": 10_000_000,
"included_storage_mb": 5000,
"overage_per_1k_tokens": 0.002,
"max_agents": 15,
},
"enterprise": {
"base_price": "custom",
"included_tokens": "unlimited",
"included_storage_mb": "unlimited",
"overage_per_1k_tokens": 0,
"max_agents": "unlimited",
},
}
Usage Tracking Pipeline
import stripe
from datetime import datetimeasync def track_and_bill_usage(tenant_id: str, usage: dict): """Track token usage and report to Stripe for metered billing.""" await redis.hincrby(f"usage:{tenant_id}:{current_month()}", "tokens", usage["total_tokens"]) await redis.hincrby(f"usage:{tenant_id}:{current_month()}", "requests", 1)
total_tokens = int(await redis.hget(f"usage:{tenant_id}:{current_month()}", "tokens") or 0) plan = await get_tenant_plan(tenant_id) included = PLANS[plan]["included_tokens"]
if total_tokens > included: overage = total_tokens - included stripe.SubscriptionItem.create_usage_record( subscription_item_id=await get_stripe_item(tenant_id), quantity=overage // 1000, timestamp=int(datetime.utcnow().timestamp()), action="set", ) ```
Feature Flags for AI Features
Feature flags are critical for AI SaaS because you are constantly experimenting with models, prompts, and features:
from posthog import Posthogposthog = Posthog(project_api_key=POSTHOG_KEY, host=POSTHOG_HOST)
async def get_ai_config(tenant_id: str) -> dict: """Resolve feature flags and AI configuration for a tenant.""" flags = posthog.get_all_flags(tenant_id) return { "model": "claude-sonnet-4" if flags.get("use-claude") else "gpt-4o", "rag_enabled": flags.get("rag-v2", False), "streaming_enabled": flags.get("streaming-responses", True), "max_context_tokens": 8000 if flags.get("extended-context") else 4000, "reranking_enabled": flags.get("reranking", False), } ```
Deployment Strategies
Docker Compose for Early Stage
services:
api:
build: ./api
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/saas
- QDRANT_URL=http://qdrant:6333
- REDIS_URL=redis://redis:6379
ports:
- "8000:8000"
deploy:
resources:
limits:
memory: 2Gworker: build: ./worker environment: - CELERY_BROKER_URL=redis://redis:6379 deploy: replicas: 3
qdrant: image: qdrant/qdrant:latest volumes: - qdrant_data:/qdrant/storage ports: - "6333:6333"
redis: image: redis:7-alpine volumes: - redis_data:/data
db: image: postgres:16-alpine volumes: - pg_data:/var/lib/postgresql/data environment: - POSTGRES_DB=saas ```
Kubernetes for Growth Stage
When you outgrow Docker Compose (typically around 50+ tenants with production SLAs), move to Kubernetes with separate deployments for API, workers, and AI inference:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-inference
spec:
replicas: 2
selector:
matchLabels:
app: ai-inference
template:
spec:
containers:
- name: ollama
image: ollama/ollama:latest
resources:
limits:
nvidia.com/gpu: 1
memory: "16Gi"
volumeMounts:
- name: model-cache
mountPath: /root/.ollama
nodeSelector:
gpu: "true"
Lessons from Building Hureka AI
- 1Ship multi-tenancy on day one. Retrofitting tenant isolation is 10x harder than building it from the start.
- 2BYOK wins enterprise deals. It removed the biggest objection ("we cannot send data to your API keys") in three enterprise deals.
- 3Usage-based billing requires real-time tracking. Batch billing at month-end causes surprise charges and churn.
- 4Feature flags are not optional. We rolled back an AI model change in production via a flag in under 60 seconds.
- 5Start with Docker Compose. Kubernetes adds operational complexity that does not pay off until you have real scale.
Conclusion
Building an AI-powered SaaS product is as much about the SaaS architecture as it is about the AI. Multi-tenancy, billing, key management, and deployment are the foundation that determines whether your product can grow from 10 customers to 10,000.
The patterns in this guide are battle-tested from building Hureka AI. They are not the only way, but they represent a proven path from MVP to revenue.
If you are building an AI SaaS product and need help with architecture decisions, [schedule a consultation](/contact). We offer [hands-on architecture services](/services) specifically for AI-powered products — from initial design through production launch.