Nimit SharmaAI systems, voice infra, AWS

Apr 29, 2026•1 min read

Building a Real-Time Voice AI System

How I built a live calling pipeline that keeps latency under control.

voice-airealtimeawsllm

Positioning

I built a production-grade voice system for live calls, not a demo loop.

Architecture

Exotel for call orchestration
Deepgram for streaming STT
OpenAI for reasoning and tool use
ElevenLabs for low-latency TTS
AWS ECS for containerized execution

System tradeoffs

Latency stacked quickly when each step waited on the previous one
Vendor outages could cascade into dropped calls
Audio sync drift showed up as soon as one stage slowed down

What I changed

Parallel chunk processing for the hot path
Explicit timeout budgets per stage
Fallback handling when one vendor degraded
Tracing on each call leg for debugging

Outcome

Kept the pipeline responsive under real traffic
Reduced sensitivity to vendor slowdowns
Made the failure mode visible instead of silent