deepgram/deepgram-python-sdk websockets-reference.md **authorization:** typing.Optional[str] — Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter. **Example:** token %DEEPGRAM_API_KEY% or bearer %DEEPGR read this and then tell me what changes i have to do
You’re asking the right question 👍 — for an AI avatar (real-time voice + lip sync) the architecture matters a lot.
Short answer first:
✅ Use AsyncDeepgramClient
❌ Do NOT use the normal DeepgramClient for the avatar voice pipeline.
Because your stack already uses:
- FastAPI
- Uvicorn
- WebSockets
- Real-time audio streaming
Those frameworks are async-based, so the async Deepgram client fits naturally.
Why AsyncDeepgramClient is required for avatars
An avatar system usually runs this pipeline simultaneously:
User speech
↓
STT (speech-to-text)
↓
LLM response
↓
TTS audio stream
↓
Avatar lip-sync animation
All these must run at the same time.
If you use the sync client, it blocks the event loop.
Example problem:
User speaks → STT running
Avatar cannot animate
UI freezes
Async avoids that.
What you need to change in your project
Right now your client file probably looks like this:
from deepgram import DeepgramClient
Change it to:
from deepgram import AsyncDeepgramClient
Correct deepgram_client.py for avatar
from deepgram import AsyncDeepgramClient
import os
from dotenv import load_dotenvload_dotenv()DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY", "").strip()if not DEEPGRAM_API_KEY:
raise ValueError("DEEPGRAM_API_KEY environment variable not set")deepgram = AsyncDeepgramClient(DEEPGRAM_API_KEY)
Then your TTS websocket becomes
Example for avatar speech:
from deepgram.extensions.types.sockets import SpeakV1TextMessageasync with deepgram.speak.v1.connect(
model="aura-2-asteria-en",
encoding="linear16",
sample_rate=24000
) as connection: await connection.send_text(
SpeakV1TextMessage(text="Hello avatar user")
)
This streams audio in real time.
Perfect for:
- lip sync
- facial animation
- real-time talking avatars
Your final architecture should look like this
Frontend Avatar
│
│ WebSocket
▼
FastAPI backend
│
├── Deepgram STT (async)
│
├── LLM (Gemini / OpenAI)
│
└── Deepgram TTS (async websocket)
│
▼
Stream audio to avatar
Everything async.
Important rule
Use:
AsyncDeepgramClient
for:
- avatars
- realtime assistants
- voice bots
- streaming audio
- websocket apps
Use:
DeepgramClient
only for:
- simple scripts
- batch transcription
- offline processing
💡 Since you’re building an avatar system, the next big improvement is something most developers miss:
Use Deepgram streaming STT instead of prerecorded STT.
Your current code uses:
listen.prerecorded
That adds latency.
Streaming STT can make the avatar respond ~800ms faster.
If you want, I can show you the exact architecture used in production avatar systems (Deepgram + FastAPI + WebSocket + Gemini). It will simplify your project a lot