About chatGpt-5.4

deepgram/deepgram-python-sdk websockets-reference.md **authorization:** typing.Optional[str] — Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter. **Example:** token %DEEPGRAM_API_KEY% or bearer %DEEPGR read this and then tell me what changes i have to do

You’re asking the right question 👍 — for an AI avatar (real-time voice + lip sync) the architecture matters a lot.

Short answer first:

Use AsyncDeepgramClient
❌ Do NOT use the normal DeepgramClient for the avatar voice pipeline.

Because your stack already uses:

  • FastAPI
  • Uvicorn
  • WebSockets
  • Real-time audio streaming

Those frameworks are async-based, so the async Deepgram client fits naturally.


Why AsyncDeepgramClient is required for avatars

An avatar system usually runs this pipeline simultaneously:

User speech

STT (speech-to-text)

LLM response

TTS audio stream

Avatar lip-sync animation

All these must run at the same time.

If you use the sync client, it blocks the event loop.

Example problem:

User speaks → STT running
Avatar cannot animate
UI freezes

Async avoids that.


What you need to change in your project

Right now your client file probably looks like this:

from deepgram import DeepgramClient

Change it to:

from deepgram import AsyncDeepgramClient

Correct deepgram_client.py for avatar

from deepgram import AsyncDeepgramClient
import os
from dotenv import load_dotenvload_dotenv()DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY", "").strip()if not DEEPGRAM_API_KEY:
raise ValueError("DEEPGRAM_API_KEY environment variable not set")deepgram = AsyncDeepgramClient(DEEPGRAM_API_KEY)

Then your TTS websocket becomes

Example for avatar speech:

from deepgram.extensions.types.sockets import SpeakV1TextMessageasync with deepgram.speak.v1.connect(
model="aura-2-asteria-en",
encoding="linear16",
sample_rate=24000
) as connection: await connection.send_text(
SpeakV1TextMessage(text="Hello avatar user")
)

This streams audio in real time.

Perfect for:

  • lip sync
  • facial animation
  • real-time talking avatars

Your final architecture should look like this

Frontend Avatar

│ WebSocket

FastAPI backend

├── Deepgram STT (async)

├── LLM (Gemini / OpenAI)

└── Deepgram TTS (async websocket)


Stream audio to avatar

Everything async.


Important rule

Use:

AsyncDeepgramClient

for:

  • avatars
  • realtime assistants
  • voice bots
  • streaming audio
  • websocket apps

Use:

DeepgramClient

only for:

  • simple scripts
  • batch transcription
  • offline processing

💡 Since you’re building an avatar system, the next big improvement is something most developers miss:

Use Deepgram streaming STT instead of prerecorded STT.

Your current code uses:

listen.prerecorded

That adds latency.

Streaming STT can make the avatar respond ~800ms faster.

If you want, I can show you the exact architecture used in production avatar systems (Deepgram + FastAPI + WebSocket + Gemini). It will simplify your project a lot

Scroll to Top