Dilip Singh is a Lead AI Architect and AI developer based in Delhi, India. He has 14+ years of experience building enterprise AI chatbots, AI assistants, multi-agent platforms, RAG pipelines, and ontology-driven knowledge systems. He is Lead Software Architect at Hureka Technologies and has delivered 118+ production projects globally.

Is Dilip Singh an AI developer?

Yes. Dilip Singh is a senior AI developer and architect specializing in production AI systems — LLM orchestration, RAG pipelines, AI chatbots, voice AI assistants, and multi-agent platforms. He works with Claude, OpenAI, Ollama, Qdrant, Temporal, Next.js, and FastAPI.

Does Dilip Singh build AI chatbots and AI assistants?

Yes. Dilip builds enterprise AI chatbots and AI assistants with RAG grounding, multi-channel deployment (web, Slack, Teams), human approval workflows, and per-tenant knowledge bases. Flagship projects include Hureka AI (BYOK support platform) and AImind Agent Hub (multi-agent chat, email, and voice).

Does Dilip Singh work with ontology and knowledge graphs for AI?

Yes. Dilip designs semantic ontologies and knowledge graphs to structure AI retrieval — taxonomy design, entity relationships, and RAG grounding for more accurate AI assistant and chatbot responses. His blog covers ontology-driven content architecture for AI systems.

What services does Dilip Singh offer for freelance AI projects?

Dilip Singh offers AI architecture consulting, AI chatbot development, AI assistant systems, ontology/RAG design, multi-agent AI development, voice AI integration, enterprise SaaS architecture, Drupal-to-modern migration, and CTO-as-a-service for startups.

Is Dilip Singh available for remote freelance work?

Yes. Dilip is based in Delhi, India (IST/Asia timezone) and works with clients globally including USA, Canada, Tanzania, and Europe. Engagements include hourly consulting, fixed-price projects, and monthly retainers.

What is the typical project budget for AI architecture work?

Project budgets vary by scope. AI MVP development typically starts from $15,000, multi-agent AI platforms from $30,000, and enterprise AI architecture engagements from $50,000+. Discovery calls are free to scope requirements.

How quickly does Dilip Singh respond to project inquiries?

All inquiries receive a response within 24 hours. Urgent projects can be discussed via email at dilip@hurekatek.com or WhatsApp.

What technologies does Dilip Singh specialize in?

Core expertise includes AI chatbots, AI assistants, multi-agent AI, RAG pipelines (Qdrant, Pinecone), ontology/knowledge graphs, LLM orchestration (Claude, OpenAI, Ollama), voice AI (Pipecat, LiveKit, Whisper), Next.js, FastAPI, Temporal, Docker, Kubernetes, and enterprise Drupal/Laravel systems.

All posts

RAG SystemsIntermediate2026-04-04·9 min read

Evaluating RAG Systems: Beyond "Looks Good" with Ragas

How to evaluate RAG quality rigorously. Faithfulness, answer relevance, context precision, context recall — using Ragas to catch regressions before users do.

RAG Evaluation Ragas Metrics LLM Testing Quality

"Looks Good" Is Not a Test Strategy

Most teams ship RAG and pray. Then a user finds a hallucination. Then the team panics and tweaks something. Then quality regresses elsewhere. Sound familiar?

You can't improve what you don't measure. Ragas gives you four metrics that catch 90% of RAG quality issues.

The Four Core Metrics

Metric	What it measures
Faithfulness	Does the answer only use facts from retrieved context?
Answer Relevance	Does the answer actually address the question?
Context Precision	Are the retrieved chunks relevant (not noise)?
Context Recall	Did we retrieve all needed information?

Setting Up Ragas

python

from ragas import evaluate
from ragas.metrics import (
    faithfulness, answer_relevancy,
    context_precision, context_recall,
)
from datasets import Dataset

eval_set = Dataset.from_list([ { "question": "What is our return policy?", "answer": rag_response.answer, "contexts": rag_response.retrieved_chunks, "ground_truth": "30 day return window with receipt", }, # ... 100 more ])

result = evaluate( eval_set, metrics=[faithfulness, answer_relevancy, context_precision, context_recall], ) print(result) ```

Building Your Eval Set

Two sources:

1. Synthetic from your docs: ``python from ragas.testset import TestsetGenerator generator = TestsetGenerator.from_langchain(...) testset = generator.generate_with_langchain_docs(docs, test_size=100)``

2. Real user queries with manually-annotated ground truth (this is the gold standard).

CI Integration

Every PR runs the eval set. Fail if any metric drops > 3%:

yaml

# .github/workflows/rag-eval.yml
- name: Run RAG evaluation
  run: |
    python scripts/eval_rag.py --threshold-file thresholds.json
- name: Comment PR with results
  uses: marocchino/sticky-pull-request-comment@v2
  with:
    message: |
      ## RAG Eval Results
      | Metric | Score | Δ vs main |
      |--------|-------|-----------|
      | Faithfulness | ${{ env.FAITHFULNESS }} | ${{ env.FAITH_DELTA }} |
      | Answer Relevance | ${{ env.ANSWER_REL }} | ${{ env.ANSWER_DELTA }} |

What Each Score Tells You

Faithfulness < 0.85 → Hallucination problem. The LLM is making things up. Lower temperature, add "if you don't know, say so" instructions.
Answer Relevance < 0.80 → Answer drifts off-topic. Tighten your generation prompt.
Context Precision < 0.70 → Retrieval brings noise. Add reranking, raise score threshold.
Context Recall < 0.80 → Missing relevant chunks. Increase k, improve chunking, fine-tune embeddings.

My Production Targets

Metric	Threshold
Faithfulness	≥ 0.90
Answer Relevance	≥ 0.85
Context Precision	≥ 0.75
Context Recall	≥ 0.80

Below those, the system is unsafe to ship.

Dilip Singh

Lead Software Architect · Hureka Technologies

14+ years building enterprise software and AI systems. Architecting multi-agent AI platforms, RAG pipelines, voice AI, and high-performance SaaS for global clients.

Hire me →About →

RAG Systems · 20 min read

Enterprise RAG Pipeline Architecture: From POC to Production

RAG Systems · 18 min read

RAG Pipeline Design: Chunking, Embeddings & Qdrant at Production Scale

RAG Systems · 11 min read

Vector Database Showdown 2026: Qdrant vs Pinecone vs Weaviate vs pgvector

All posts Work together