Dilip Singh is a Lead AI Architect and AI developer based in Delhi, India. He has 14+ years of experience building enterprise AI chatbots, AI assistants, multi-agent platforms, RAG pipelines, and ontology-driven knowledge systems. He is Lead Software Architect at Hureka Technologies and has delivered 118+ production projects globally.

Is Dilip Singh an AI developer?

Yes. Dilip Singh is a senior AI developer and architect specializing in production AI systems — LLM orchestration, RAG pipelines, AI chatbots, voice AI assistants, and multi-agent platforms. He works with Claude, OpenAI, Ollama, Qdrant, Temporal, Next.js, and FastAPI.

Does Dilip Singh build AI chatbots and AI assistants?

Yes. Dilip builds enterprise AI chatbots and AI assistants with RAG grounding, multi-channel deployment (web, Slack, Teams), human approval workflows, and per-tenant knowledge bases. Flagship projects include Hureka AI (BYOK support platform) and AImind Agent Hub (multi-agent chat, email, and voice).

Does Dilip Singh work with ontology and knowledge graphs for AI?

Yes. Dilip designs semantic ontologies and knowledge graphs to structure AI retrieval — taxonomy design, entity relationships, and RAG grounding for more accurate AI assistant and chatbot responses. His blog covers ontology-driven content architecture for AI systems.

What services does Dilip Singh offer for freelance AI projects?

Dilip Singh offers AI architecture consulting, AI chatbot development, AI assistant systems, ontology/RAG design, multi-agent AI development, voice AI integration, enterprise SaaS architecture, Drupal-to-modern migration, and CTO-as-a-service for startups.

Is Dilip Singh available for remote freelance work?

Yes. Dilip is based in Delhi, India (IST/Asia timezone) and works with clients globally including USA, Canada, Tanzania, and Europe. Engagements include hourly consulting, fixed-price projects, and monthly retainers.

What is the typical project budget for AI architecture work?

Project budgets vary by scope. AI MVP development typically starts from $15,000, multi-agent AI platforms from $30,000, and enterprise AI architecture engagements from $50,000+. Discovery calls are free to scope requirements.

How quickly does Dilip Singh respond to project inquiries?

All inquiries receive a response within 24 hours. Urgent projects can be discussed via email at dilip@hurekatek.com or WhatsApp.

What technologies does Dilip Singh specialize in?

Core expertise includes AI chatbots, AI assistants, multi-agent AI, RAG pipelines (Qdrant, Pinecone), ontology/knowledge graphs, LLM orchestration (Claude, OpenAI, Ollama), voice AI (Pipecat, LiveKit, Whisper), Next.js, FastAPI, Temporal, Docker, Kubernetes, and enterprise Drupal/Laravel systems.

All posts

RAG SystemsAdvanced2026-01-30·11 min read

Fine-Tuning Embeddings for Domain-Specific RAG: A 20% Recall Jump

Generic embeddings (BGE, OpenAI) leave 20% recall on the table for domain text. Learn how to mine training pairs from your own documents and fine-tune sentence-transformers for medical, legal, or financial RAG.

Embeddings Fine-Tuning Sentence Transformers RAG Domain Adaptation

Why Generic Embeddings Fall Short

OpenAI's text-embedding-3 and BGE are trained on internet text. They're great for general queries but they don't know that "MI" means myocardial infarction, "AKI" means acute kidney injury, or that "Section 230" is a US law about platform liability.

For domain-specific RAG, fine-tuning your own embeddings on your domain corpus gives 15–25% improvement in retrieval recall.

Strategy: Mine Pairs from Existing Documents

You don't need labeled data. Generate (query, positive_chunk) pairs from your existing corpus:

python

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

# 1. Use a teacher LLM to generate synthetic queries for each chunk async def generate_queries(chunk: str, n: int = 3) -> list[str]: prompt = f"""Generate {n} different questions that this passage answers. Return as a JSON array of strings.

Passage: {chunk} """ response = await llm.complete(prompt) return json.loads(response)

# 2. Build training pairs pairs = [] for chunk in domain_chunks: queries = await generate_queries(chunk, n=3) for q in queries: pairs.append(InputExample(texts=[q, chunk])) ```

Mining Hard Negatives

Random negatives are too easy. Mine hard negatives — chunks that look similar but aren't the right answer:

python

from sentence_transformers import util

base_model = SentenceTransformer("BAAI/bge-base-en-v1.5") chunk_embeddings = base_model.encode([p.texts[1] for p in pairs], convert_to_tensor=True)

for i, pair in enumerate(pairs): q_emb = base_model.encode(pair.texts[0], convert_to_tensor=True) similarities = util.cos_sim(q_emb, chunk_embeddings)[0] similarities[i] = -1 # exclude the true positive hard_neg_idx = similarities.argmax().item() pair.texts.append(pairs[hard_neg_idx].texts[1]) ```

Fine-Tuning Loop

python

model = SentenceTransformer("BAAI/bge-base-en-v1.5")
train_dataloader = DataLoader(pairs, shuffle=True, batch_size=32)

# Multiple Negatives Ranking Loss — uses in-batch negatives + the hard neg train_loss = losses.MultipleNegativesRankingLoss(model)

model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=3, warmup_steps=int(len(train_dataloader) * 0.1), output_path="./bge-medical-v1", show_progress_bar=True, ) ```

Evaluation

Always evaluate against a held-out eval set with measurable metrics:

python

from sentence_transformers.evaluation import InformationRetrievalEvaluator

evaluator = InformationRetrievalEvaluator( queries=eval_queries, # {query_id: query_text} corpus=eval_corpus, # {doc_id: doc_text} relevant_docs=eval_relevant, # {query_id: set(relevant_doc_ids)} name="medical-eval", show_progress_bar=True, )

result = evaluator(model, output_path="./eval-results") ```

Typical Improvements

Domain	Generic BGE Recall@5	Fine-tuned Recall@5
Medical	0.62	0.81
Legal	0.58	0.79
Financial filings	0.66	0.84

Practical Tips

1Quality > quantity — 5,000 well-mined pairs > 50,000 noisy ones
2Filter by length — Drop ultra-short or ultra-long chunks before training
3Domain LLM for query generation — Generic GPT can generate generic-sounding questions; use a domain-tuned LLM where possible
4Validate by humans — Sample 100 generated pairs and read them. If they look unnatural, regenerate
5Version your embeddings — Re-indexing 50M vectors is expensive; always know which model produced which collection

Dilip Singh

Lead Software Architect · Hureka Technologies

14+ years building enterprise software and AI systems. Architecting multi-agent AI platforms, RAG pipelines, voice AI, and high-performance SaaS for global clients.

Hire me →About →

RAG Systems · 18 min read

RAG Pipeline Design: Chunking, Embeddings & Qdrant at Production Scale

RAG Systems · 20 min read

Enterprise RAG Pipeline Architecture: From POC to Production

RAG Systems · 11 min read

Vector Database Showdown 2026: Qdrant vs Pinecone vs Weaviate vs pgvector

All posts Work together