Apr 29, 20261 min read

Building a Real-Time Voice AI System

How I built a live calling pipeline that keeps latency under control.

voice-airealtimeawsllm

Positioning

I built a production-grade voice system for live calls, not a demo loop.

Architecture

  • Exotel for call orchestration
  • Deepgram for streaming STT
  • OpenAI for reasoning and tool use
  • ElevenLabs for low-latency TTS
  • AWS ECS for containerized execution

System tradeoffs

  • Latency stacked quickly when each step waited on the previous one
  • Vendor outages could cascade into dropped calls
  • Audio sync drift showed up as soon as one stage slowed down

What I changed

  • Parallel chunk processing for the hot path
  • Explicit timeout budgets per stage
  • Fallback handling when one vendor degraded
  • Tracing on each call leg for debugging

Outcome

  • Kept the pipeline responsive under real traffic
  • Reduced sensitivity to vendor slowdowns
  • Made the failure mode visible instead of silent