Dilip Singh is a Lead AI Architect and AI developer based in Delhi, India. He has 14+ years of experience building enterprise AI chatbots, AI assistants, multi-agent platforms, RAG pipelines, and ontology-driven knowledge systems. He is Lead Software Architect at Hureka Technologies and has delivered 118+ production projects globally.

Is Dilip Singh an AI developer?

Yes. Dilip Singh is a senior AI developer and architect specializing in production AI systems — LLM orchestration, RAG pipelines, AI chatbots, voice AI assistants, and multi-agent platforms. He works with Claude, OpenAI, Ollama, Qdrant, Temporal, Next.js, and FastAPI.

Does Dilip Singh build AI chatbots and AI assistants?

Yes. Dilip builds enterprise AI chatbots and AI assistants with RAG grounding, multi-channel deployment (web, Slack, Teams), human approval workflows, and per-tenant knowledge bases. Flagship projects include Hureka AI (BYOK support platform) and AImind Agent Hub (multi-agent chat, email, and voice).

Does Dilip Singh work with ontology and knowledge graphs for AI?

Yes. Dilip designs semantic ontologies and knowledge graphs to structure AI retrieval — taxonomy design, entity relationships, and RAG grounding for more accurate AI assistant and chatbot responses. His blog covers ontology-driven content architecture for AI systems.

What services does Dilip Singh offer for freelance AI projects?

Dilip Singh offers AI architecture consulting, AI chatbot development, AI assistant systems, ontology/RAG design, multi-agent AI development, voice AI integration, enterprise SaaS architecture, Drupal-to-modern migration, and CTO-as-a-service for startups.

Is Dilip Singh available for remote freelance work?

Yes. Dilip is based in Delhi, India (IST/Asia timezone) and works with clients globally including USA, Canada, Tanzania, and Europe. Engagements include hourly consulting, fixed-price projects, and monthly retainers.

What is the typical project budget for AI architecture work?

Project budgets vary by scope. AI MVP development typically starts from $15,000, multi-agent AI platforms from $30,000, and enterprise AI architecture engagements from $50,000+. Discovery calls are free to scope requirements.

How quickly does Dilip Singh respond to project inquiries?

All inquiries receive a response within 24 hours. Urgent projects can be discussed via email at dilip@hurekatek.com or WhatsApp.

What technologies does Dilip Singh specialize in?

Core expertise includes AI chatbots, AI assistants, multi-agent AI, RAG pipelines (Qdrant, Pinecone), ontology/knowledge graphs, LLM orchestration (Claude, OpenAI, Ollama), voice AI (Pipecat, LiveKit, Whisper), Next.js, FastAPI, Temporal, Docker, Kubernetes, and enterprise Drupal/Laravel systems.

All posts

Series: Self-Hosted AI · Part 4 of 4

1. Self-Hosted Voice AI 2. FastAPI Production Patterns 3. Ollama in Production 4. Voice Activity Detection

Voice AIAdvanced2026-04-12·10 min read

Voice Activity Detection: The Hidden Make-or-Break of Voice AI

VAD decides when the user is done speaking. Get it wrong and the agent interrupts or hangs. A deep dive into Silero VAD, energy thresholds, end-of-turn detection, and barge-in handling.

VAD Voice AI Silero Pipecat WebRTC Audio

Why VAD is Where Voice AI Lives or Dies

Latency is what users feel, but VAD is what makes the conversation feel natural. Get end-of-turn detection wrong and the bot interrupts. Wait too long and it feels sluggish.

Out of 50 voice AI improvements we shipped at Hureka, 12 were VAD tuning.

Silero VAD: The Production Standard

python

import torch
import numpy as np

model, utils = torch.hub.load( repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=False ) (get_speech_timestamps, _, _, _, _) = utils

def is_speech(audio_chunk: np.ndarray, sample_rate: int = 16000) -> float: """Return probability that this 30ms chunk contains speech (0-1).""" tensor = torch.from_numpy(audio_chunk).float() return float(model(tensor, sample_rate)) ```

End-of-Turn Detection State Machine

python

class TurnDetector:
    def __init__(self, sample_rate=16000, frame_ms=30):
        self.sr = sample_rate
        self.frame_size = int(sample_rate * frame_ms / 1000)
        self.speech_buffer = []
        self.silence_ms = 0
        self.speech_ms = 0
        self.state = "idle"

def feed(self, audio_frame: np.ndarray) -> dict: p = is_speech(audio_frame, self.sr) is_voiced = p > 0.5

if self.state == "idle": if is_voiced: self.state = "speaking" self.speech_ms = 30 return {"event": "speech_started"}

elif self.state == "speaking": if is_voiced: self.speech_ms += 30 self.silence_ms = 0 else: self.silence_ms += 30 if self.silence_ms >= 500: # 500ms silence = turn end self.state = "idle" duration = self.speech_ms self.speech_ms = 0 return {"event": "turn_end", "duration_ms": duration} return {"event": None} ```

The Three Magic Numbers

Parameter	Recommended	Why
speech_threshold	0.5	Higher = miss soft speakers; lower = false triggers
min_speech_ms	250	Filters single sharp noises (door, cough)
end_silence_ms	500	< 400 cuts people off; > 700 feels sluggish

These vary by language and accent. Tune on real users in your target market.

Barge-In: Letting the User Interrupt

When TTS is playing, the user might interrupt. You MUST detect it and stop TTS instantly:

python

async def on_user_audio(audio_chunk):
    if tts_is_speaking:
        if is_speech(audio_chunk) > 0.7 and energy(audio_chunk) > BARGE_ENERGY_THRESHOLD:
            await tts_engine.interrupt()
            await llm_pipeline.cancel_current_generation()

The energy gate prevents echo from your own TTS triggering false barge-ins.

Common VAD Failures

1Background TV/radio — Continuous human voice; VAD never sees silence. Mitigate with active noise suppression upstream (RNNoise, WebRTC NS).
2Slow speakers / elderly users — They pause mid-sentence. Increase end_silence_ms to 800ms for that segment.
3Code-switching speakers — Silero performs worse on non-English. Test specifically.
4High-latency mic — Anything > 100ms of mic latency feels broken. Measure end-to-end.

Dilip Singh

Lead Software Architect · Hureka Technologies

14+ years building enterprise software and AI systems. Architecting multi-agent AI platforms, RAG pipelines, voice AI, and high-performance SaaS for global clients.

Hire me →About →

Voice AI · 15 min read

Self-Hosted Voice AI: The Complete Pipecat + LiveKit + Ollama Stack

Voice AI · 16 min read

Self-Hosted Voice AI vs Cloud: Why We Ditched Twilio AI and Built Our Own

Infrastructure · 9 min read

Ollama in Production: GPU Sizing, Concurrent Requests & Model Management

All posts Work together