AI Voice Agent
100% Self-Hosted Voice AI
Overview
A fully self-hosted conversational voice AI with zero cloud dependency — Pipecat orchestrates Faster-Whisper STT, Ollama LLM, and pyttsx3 TTS over LiveKit WebRTC, with Twilio telephony integration.
The Challenge
A Canadian healthcare client required a voice AI agent for patient appointment scheduling and FAQs, but HIPAA compliance and data sovereignty rules prohibited sending audio or PHI to cloud AI services. The solution needed sub-400ms latency, telephone access via PSTN, and complete on-premise deployment.
The Solution
Built a fully self-hosted voice pipeline: LiveKit for WebRTC transport, Pipecat for pipeline orchestration, Faster-Whisper (int8 quantized) for STT, Ollama running LLaMA 3 8B for inference, and pyttsx3 for offline TTS. Integrated Twilio SIP for telephone access. Achieved 250–400ms end-to-end latency with streaming LLM responses and GPU-accelerated Whisper.
Results
- ✓Zero cloud AI dependencies — fully on-premise deployment
- ✓250–400ms end-to-end voice latency achieved
- ✓Twilio PSTN integration for telephone access
- ✓HIPAA-compliant architecture with no PHI sent to external APIs
- ✓GPU-accelerated Whisper STT (~80ms for 1–2 second utterances)