Dilip Singh logo
All posts
RAG SystemsIntermediate2026-02-08·10 min read

PostgreSQL pgvector for Production RAG: Indexing, Hybrid Search & Scale

When pgvector beats a dedicated vector database. Index choices (HNSW vs IVFFlat), tuning for 10M+ rows, hybrid search with full-text, and the moment you should reach for Qdrant instead.

The "Just Use Postgres" Argument

You probably already run Postgres. Adding pgvector is one extension, zero new services, full SQL power, and existing operational know-how. For 80% of teams starting with RAG, pgvector is the right call.

Setup

sql
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE chunks ( id BIGSERIAL PRIMARY KEY, tenant_id UUID NOT NULL, doc_id UUID NOT NULL, chunk_text TEXT NOT NULL, embedding VECTOR(768) NOT NULL, metadata JSONB NOT NULL DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW() ); ```

HNSW vs IVFFlat

IndexBuild timeQuery speedRecallMemory
HNSWSlowVery fastHighHigher
IVFFlatFastFastTunableLower

For < 10M vectors, HNSW is almost always the right choice:

sql
CREATE INDEX idx_chunks_embedding ON chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Tenant-first composite (so RLS doesn't blow up performance) CREATE INDEX idx_chunks_tenant ON chunks (tenant_id, created_at DESC); ```

Query with Tenant Isolation

sql
SELECT id, chunk_text, 1 - (embedding <=> $1) AS similarity
FROM chunks
WHERE tenant_id = $2
  AND (embedding <=> $1) < 0.4  -- distance < 0.4 = similarity > 0.6
ORDER BY embedding <=> $1
LIMIT 10;

Tune `ef_search` Per Query

Higher ef_search = more recall, slower query. Tune it per-request:

sql
SET LOCAL hnsw.ef_search = 100;  -- default is 40

Hybrid Search: Dense + Full-Text

sql
ALTER TABLE chunks ADD COLUMN tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('english', chunk_text)) STORED;
CREATE INDEX idx_chunks_tsv ON chunks USING gin(tsv);

-- Combined ranking with Reciprocal Rank Fusion WITH dense AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS rank FROM chunks WHERE tenant_id = $2 ORDER BY embedding <=> $1 LIMIT 50 ), sparse AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, query) DESC) AS rank FROM chunks, plainto_tsquery('english', $3) AS query WHERE tenant_id = $2 AND tsv @@ query ORDER BY ts_rank(tsv, query) DESC LIMIT 50 ) SELECT c.id, c.chunk_text, COALESCE(1.0/(60 + d.rank), 0) + COALESCE(1.0/(60 + s.rank), 0) AS rrf FROM chunks c LEFT JOIN dense d USING (id) LEFT JOIN sparse s USING (id) WHERE d.id IS NOT NULL OR s.id IS NOT NULL ORDER BY rrf DESC LIMIT 10; ```

When to Outgrow pgvector

  • > 50M vectors and you need sub-50ms p95
  • HNSW build times start exceeding maintenance windows
  • You need quantization (binary or scalar) for memory efficiency
  • Multiple vector spaces per query (rare, but happens)

Until then, the operational simplicity of one database wins.

DS
Dilip Singh
Lead Software Architect · Hureka Technologies

14+ years building enterprise software and AI systems. Architecting multi-agent AI platforms, RAG pipelines, voice AI, and high-performance SaaS for global clients.