Apr 29, 2026•1 min read
Building a Real-Time Voice AI System
How I built a live calling pipeline that keeps latency under control.
voice-airealtimeawsllm
Positioning
I built a production-grade voice system for live calls, not a demo loop.
Architecture
- Exotel for call orchestration
- Deepgram for streaming STT
- OpenAI for reasoning and tool use
- ElevenLabs for low-latency TTS
- AWS ECS for containerized execution
System tradeoffs
- Latency stacked quickly when each step waited on the previous one
- Vendor outages could cascade into dropped calls
- Audio sync drift showed up as soon as one stage slowed down
What I changed
- Parallel chunk processing for the hot path
- Explicit timeout budgets per stage
- Fallback handling when one vendor degraded
- Tracing on each call leg for debugging
Outcome
- Kept the pipeline responsive under real traffic
- Reduced sensitivity to vendor slowdowns
- Made the failure mode visible instead of silent