Dilip Singh is a Lead AI Architect and AI developer based in Delhi, India. He has 14+ years of experience building enterprise AI chatbots, AI assistants, multi-agent platforms, RAG pipelines, and ontology-driven knowledge systems. He is Lead Software Architect at Hureka Technologies and has delivered 118+ production projects globally.

Is Dilip Singh an AI developer?

Yes. Dilip Singh is a senior AI developer and architect specializing in production AI systems — LLM orchestration, RAG pipelines, AI chatbots, voice AI assistants, and multi-agent platforms. He works with Claude, OpenAI, Ollama, Qdrant, Temporal, Next.js, and FastAPI.

Does Dilip Singh build AI chatbots and AI assistants?

Yes. Dilip builds enterprise AI chatbots and AI assistants with RAG grounding, multi-channel deployment (web, Slack, Teams), human approval workflows, and per-tenant knowledge bases. Flagship projects include Hureka AI (BYOK support platform) and AImind Agent Hub (multi-agent chat, email, and voice).

Does Dilip Singh work with ontology and knowledge graphs for AI?

Yes. Dilip designs semantic ontologies and knowledge graphs to structure AI retrieval — taxonomy design, entity relationships, and RAG grounding for more accurate AI assistant and chatbot responses. His blog covers ontology-driven content architecture for AI systems.

What services does Dilip Singh offer for freelance AI projects?

Dilip Singh offers AI architecture consulting, AI chatbot development, AI assistant systems, ontology/RAG design, multi-agent AI development, voice AI integration, enterprise SaaS architecture, Drupal-to-modern migration, and CTO-as-a-service for startups.

Is Dilip Singh available for remote freelance work?

Yes. Dilip is based in Delhi, India (IST/Asia timezone) and works with clients globally including USA, Canada, Tanzania, and Europe. Engagements include hourly consulting, fixed-price projects, and monthly retainers.

What is the typical project budget for AI architecture work?

Project budgets vary by scope. AI MVP development typically starts from $15,000, multi-agent AI platforms from $30,000, and enterprise AI architecture engagements from $50,000+. Discovery calls are free to scope requirements.

How quickly does Dilip Singh respond to project inquiries?

All inquiries receive a response within 24 hours. Urgent projects can be discussed via email at dilip@hurekatek.com or WhatsApp.

What technologies does Dilip Singh specialize in?

Core expertise includes AI chatbots, AI assistants, multi-agent AI, RAG pipelines (Qdrant, Pinecone), ontology/knowledge graphs, LLM orchestration (Claude, OpenAI, Ollama), voice AI (Pipecat, LiveKit, Whisper), Next.js, FastAPI, Temporal, Docker, Kubernetes, and enterprise Drupal/Laravel systems.

All posts

Series: Self-Hosted AI · Part 1 of 4

1. Self-Hosted Voice AI 2. FastAPI Production Patterns 3. Ollama in Production 4. Voice Activity Detection

Voice AIAdvanced2025-04-22·15 min read

Self-Hosted Voice AI: The Complete Pipecat + LiveKit + Ollama Stack

How to build a fully self-hosted voice AI agent with zero cloud dependencies. Pipecat orchestrates Faster-Whisper STT, Ollama LLM, and pyttsx3 TTS over LiveKit WebRTC.

Pipecat LiveKit Whisper Ollama WebRTC Voice AI Self-Hosted

Why Self-Host Your Voice AI?

Cloud voice AI services (Google Dialogflow, AWS Lex, Azure Bot Service) charge per minute, have data privacy implications, and add latency from round-trips to remote servers. For enterprise clients — especially in healthcare, legal, or finance — a fully self-hosted stack is non-negotiable.

The Full Stack

Our self-hosted voice AI pipeline:

1LiveKit — WebRTC server for real-time audio/video transport (self-hosted)
2Pipecat — AI voice pipeline orchestration framework
3Faster-Whisper — Optimized Whisper STT (CTranslate2 backend, 4× faster than original)
4Ollama — Local LLM inference server (LLaMA 3, Mistral, Phi-3)
5pyttsx3 — Python TTS engine (offline, no API calls)

Architecture

python

import asyncio
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.services.whisper import WhisperSTTService
from pipecat.services.ollama import OLlamaLLMService
from pipecat.transports.services.livekit import LiveKitTransport

async def run_voice_agent(room_url: str, token: str): transport = LiveKitTransport( url=room_url, token=token, params=LiveKitParams(audio_in_enabled=True, audio_out_enabled=True) )

stt = WhisperSTTService(model="large-v3") llm = OLlamaLLMService(model="llama3:8b", base_url="http://localhost:11434")

pipeline = Pipeline([ transport.input(), stt, llm, transport.output(), ])

task = PipelineTask(pipeline) await task.run() ```

Latency Optimization

The biggest challenge in voice AI is latency. Human conversation tolerance is ~300ms. Our production stack achieves 250–400ms end-to-end by:

1Faster-Whisper with int8 quantization — STT in ~80ms for 1–2 second utterances
2Streaming LLM responses — Start TTS as first tokens arrive, don't wait for full response
3Turn detection — Whisper's VAD detects end-of-speech without waiting for silence timeout
4GPU acceleration — Run Whisper on GPU for 10× speedup over CPU

Twilio Integration

For telephone access, Twilio forwards calls to our LiveKit room via SIP:

python

from twilio.rest import Client

def create_call_to_livekit(to_number: str, livekit_room: str): client = Client(TWILIO_SID, TWILIO_TOKEN) call = client.calls.create( to=to_number, from_=TWILIO_NUMBER, url=f"{APP_URL}/twiml/connect/{livekit_room}" ) return call.sid ```

Dilip Singh

Lead Software Architect · Hureka Technologies

14+ years building enterprise software and AI systems. Architecting multi-agent AI platforms, RAG pipelines, voice AI, and high-performance SaaS for global clients.

Hire me →About →

Voice AI · 16 min read

Self-Hosted Voice AI vs Cloud: Why We Ditched Twilio AI and Built Our Own

Voice AI · 10 min read

Voice Activity Detection: The Hidden Make-or-Break of Voice AI

Infrastructure · 9 min read

Ollama in Production: GPU Sizing, Concurrent Requests & Model Management

All posts Work together