About chatGpt-5.4

deepgram/deepgram-python-sdk websockets-reference.md **authorization:** typing.Optional[str] — Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter. **Example:** token %DEEPGRAM_API_KEY% or bearer %DEEPGR read this and then tell me what changes i have to do

You’re asking the right question 👍 — for an AI avatar (real-time voice + lip sync) the architecture matters a lot.

Short answer first:

✅ Use AsyncDeepgramClient
❌ Do NOT use the normal DeepgramClient for the avatar voice pipeline.

Because your stack already uses:

FastAPI
Uvicorn
WebSockets
Real-time audio streaming

Those frameworks are async-based, so the async Deepgram client fits naturally.

Why AsyncDeepgramClient is required for avatars

An avatar system usually runs this pipeline simultaneously:

User speech
   ↓
STT (speech-to-text)
   ↓
LLM response
   ↓
TTS audio stream
   ↓
Avatar lip-sync animation

All these must run at the same time.

If you use the sync client, it blocks the event loop.

Example problem:

User speaks → STT running
Avatar cannot animate
UI freezes

Async avoids that.

What you need to change in your project

Right now your client file probably looks like this:

from deepgram import DeepgramClient

Change it to:

from deepgram import AsyncDeepgramClient

Correct `deepgram_client.py` for avatar

from deepgram import AsyncDeepgramClient
import os
from dotenv import load_dotenvload_dotenv()DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY", "").strip()if not DEEPGRAM_API_KEY:
    raise ValueError("DEEPGRAM_API_KEY environment variable not set")deepgram = AsyncDeepgramClient(DEEPGRAM_API_KEY)

Then your TTS websocket becomes

Example for avatar speech:

from deepgram.extensions.types.sockets import SpeakV1TextMessageasync with deepgram.speak.v1.connect(
    model="aura-2-asteria-en",
    encoding="linear16",
    sample_rate=24000
) as connection:    await connection.send_text(
        SpeakV1TextMessage(text="Hello avatar user")
    )

This streams audio in real time.

Perfect for:

lip sync
facial animation
real-time talking avatars

Your final architecture should look like this

Frontend Avatar
      │
      │ WebSocket
      ▼
FastAPI backend
      │
      ├── Deepgram STT (async)
      │
      ├── LLM (Gemini / OpenAI)
      │
      └── Deepgram TTS (async websocket)
              │
              ▼
      Stream audio to avatar

Everything async.

Important rule

Use:

AsyncDeepgramClient

for:

avatars
realtime assistants
voice bots
streaming audio
websocket apps

Use:

DeepgramClient

only for:

simple scripts
batch transcription
offline processing

💡 Since you’re building an avatar system, the next big improvement is something most developers miss:

Use Deepgram streaming STT instead of prerecorded STT.

Your current code uses:

listen.prerecorded

That adds latency.

Streaming STT can make the avatar respond ~800ms faster.

If you want, I can show you the exact architecture used in production avatar systems (Deepgram + FastAPI + WebSocket + Gemini). It will simplify your project a lot