Real-time Speech-to-Speech Streaming Backend (Deepgram + OpenAI + ElevenLabs)

This project is a Go backend server that provides real-time, low-latency Speech-to-Speech (S2S) streaming. It captures audio from a client via a WebSocket connection, transcribes it in real-time using Deepgram, sends the transcribed text to an OpenAI-compatible Large Language Model (LLM) to get a text response, and then, sentence by sentence, converts the LLM's generated text into audio using the ElevenLabs API. The final audio is streamed back to the client over the same WebSocket as it's generated.

The primary goal is to minimize perceived latency for the end-user by:

Streaming audio input for live transcription.
Quickly processing the transcribed text with an LLM.
Starting audio playback of the AI's response as soon as the first sentence is synthesized, while subsequent sentences are still being generated and processed.

Features

WebSocket Communication: Uses WebSockets for real-time, bidirectional audio and data communication.
Live Speech-to-Text: Integrates Deepgram for real-time audio transcription with Voice Activity Detection (VAD).
OpenAI/LLM Integration: Streams transcribed text to any OpenAI API-compatible LLM (e.g., GitHub Models, OpenAI's GPT).
Conversation Memory: Maintains conversation history during each WebSocket session, providing context for the LLM to generate more relevant responses.
PostgreSQL Integration: Stores conversation history in a PostgreSQL database for persistence across sessions, with non-blocking database operations to maintain low latency.
ElevenLabs Text-to-Speech Integration:
- 🆕 WebSocket Streaming (Recommended): Real-time WebSocket connection to ElevenLabs for ultra-low latency audio generation. Text is streamed incrementally as OpenAI generates it, providing faster Time-to-First-Byte (TTFB).
- HTTP Streaming (Fallback): Traditional HTTP-based streaming for compatibility, processes text sentence by sentence.
- Utilizes advanced buffering control with chunk_length_schedule and context-aware generation with previous_text and next_text parameters.
System Metrics: Provides a real-time metrics endpoint for monitoring active connections, CPU/memory usage, and other system statistics.
Status Dashboard: Includes a simple web interface for viewing system metrics.
Pipelined Streaming Workflow:
1. Client streams audio to the backend.
2. Backend streams audio to Deepgram for live transcription.
3. Deepgram sends back transcript segments (interim and final).
4. Upon utterance end (pause in client's speech), the accumulated final transcript is:
  - Added to the client's conversation history as a user message
  - Sent to the LLM along with previous conversation context
5. LLM receives the full conversation history and streams text response back.
6. LLM response is processed into sentences for real-time streaming while the full response is also captured.
7. The complete LLM response is stored in the conversation history as an assistant message.
8. Sentences are sent to ElevenLabs for TTS using a sliding window approach:
  - The first sentence is sent with its next_text (the second sentence) once available.
  - Subsequent sentences are sent with their previous_text and next_text.
  - The final sentence is sent with its previous_text. This provides context to ElevenLabs for improved speech continuity.
9. Audio for each sentence is streamed back to the client as soon as it's synthesized by ElevenLabs, allowing for low-latency playback while subsequent text is still being generated and processed.
Low Latency Focus: Optimized HTTP client for ElevenLabs (TCP_NODELAY, HTTP/2).
Concurrent Handling: Designed to handle multiple client connections concurrently.
Context-Aware Cancellation: Gracefully handles client disconnects and server-side cancellations.
Modular Design: Code is organized into packages for configuration, handlers, services, and utilities.
Configuration: API keys and service parameters are managed via environment variables or defaults.
Error Handling & Logging: Includes structured logging and error propagation.

Project Structure

The project is organized into the following directory structure:

chat_audio_streamer/
├── go.mod              # Go module definition
├── go.sum              # Go module checksums
├── main.go             # Main application entry point, HTTP server setup
├── index.html          # Example HTML/JavaScript client for Speech-to-Speech
├── config/
│   └── config.go       # Configuration loading (API keys, URLs, model IDs)
├── database/
│   └── db.go           # PostgreSQL database connection and operations
├── handlers/
│   ├── init.go         # API client initialization (OpenAI, ElevenLabs, Deepgram SDK)
│   ├── metrics_handler.go  # Endpoint handler for system metrics
│   └── websocket_handler.go  # WebSocket connection handling, S2T, and core orchestration
├── metrics/
│   └── metrics.go      # System metrics collection and tracking
├── services/
│   ├── elevenlabs_service.go # Logic for ElevenLabs API interaction and audio streaming
│   └── openai_service.go     # Logic for OpenAI API interaction and text streaming
└── utils/
    ├── http_client.go  # Custom HTTP client setup for ElevenLabs
    └── json_utils.go   # Utility for pretty-printing JSON (for logging)

Prerequisites

Go (version 1.21 or higher recommended)
A Deepgram account and API key for speech-to-text.
Access to an OpenAI-compatible LLM API endpoint and an API key/token.
An ElevenLabs account and API key for text-to-speech.
An ElevenLabs Voice ID.
(Optional) PostgreSQL database for persistent conversation storage.

Setup and Configuration

Clone the Repository (or create the project files): If you haven't already, create the project directory (chat_audio_streamer) and populate it with the Go files and index.html.
Initialize Go Module: Navigate to the root directory (chat_audio_streamer) and run:
```
go mod init chat_audio_streamer # Or your chosen module name
go mod tidy
```
This will download the necessary dependencies, including:
- github.com/gorilla/websocket
- github.com/openai/openai-go
- github.com/deepgram/deepgram-go-sdk/v2
Configure API Keys and Endpoints: The application loads configuration from environment variables. Set the following environment variables before running the application:
- DEEPGRAM_API_KEY: Your API key for Deepgram.
- OPENAI_API_KEY: Your API key/token for the OpenAI-compatible LLM.
  - For GitHub Models, this is likely a GitHub Personal Access Token (PAT) with appropriate scopes.
- OPENAI_BASE_URL: (Optional, defaults to https://api.cerebras.ai/v1 in code) The base URL for the LLM API.
- OPENAI_MODEL: (Optional, defaults to llama-4-scout-17b-16e-instruct in code) The model to use.
- OPENAI_SYSTEM_PROMPT: (Optional, defaults to "You are a helpful assistant. Respond clearly and concisely, and do not use markdown formatting. Also give short responses.") The system prompt for the LLM.
- ELEVENLABS_API_KEY: Your API key for ElevenLabs.
- ELEVENLABS_VOICE_ID: (Optional, defaults to a sample ID like ecp3DWciuUyW7BYM7II1 in code) The ElevenLabs Voice ID you want to use.
- ELEVENLABS_MODEL_ID: (Optional, defaults to eleven_flash_v2_5 in code) The ElevenLabs model ID.
- ELEVENLABS_OUTPUT_FORMAT: (Optional, defaults to mp3_44100_128 in code) The desired audio output format.
- ELEVENLABS_BASE_URL: (Optional, defaults to https://api.elevenlabs.io/v1 in code)
- ELEVENLABS_MIME_TYPE: (Optional, defaults to audio/mpeg in code) Corresponds to the output format.
- ELEVENLABS_USE_WEBSOCKET: (Optional, defaults to true) Set to true to use WebSocket streaming for lower latency, or false to use HTTP streaming (fallback mode).
- PG_DB_URL: (Optional) PostgreSQL database connection string in the format postgres://username:password@host:port/dbname?sslmode=disable. If not provided, the application will function without database persistence.
- FIREBASE_SERVICE_ACCOUNT_KEY_PATH: (Optional, defaults to serviceAccountKey.json) Path to your Firebase service account credentials JSON file. This file is required for Firebase Storage integration.
- FIREBASE_STORAGE_BUCKET: (Optional, defaults to saahara-1.appspot.com) Your Firebase Storage bucket name where audio recordings will be stored.
Example (bash/zsh):
```
export DEEPGRAM_API_KEY="YOUR_DEEPGRAM_KEY"
export OPENAI_API_KEY="ghp_YOUR_GITHUB_PAT_OR_OPENAI_KEY"
export ELEVENLABS_API_KEY="sk_YOUR_ELEVENLABS_KEY"
export ELEVENLABS_VOICE_ID="YOUR_VOICE_ID"
export ELEVENLABS_USE_WEBSOCKET="true"  # Enable WebSocket streaming for lower latency
# ... set other variables as needed
```
Alternatively, you can modify the fallback default values directly in config/config.go, but using environment variables is highly recommended for security and flexibility.
Port Configuration: The server listens on port 8080 by default. You can change this by setting the PORT environment variable:
```
export PORT=8888
```

Running the Server

Navigate to the root directory of the project (chat_audio_streamer) and run:

go run main.go

You should see log messages indicating that the API clients are initialized and the WebSocket server is starting:

YYYY/MM/DD HH:MM:SS Initializing API clients...
YYYY/MM/DD HH:MM:SS Deepgram client initialized.
YYYY/MM/DD HH:MM:SS OpenAI client configured for BaseURL: <your_openai_base_url>
YYYY/MM/DD HH:MM:SS ElevenLabs HTTP client initialized with custom transport.
YYYY/MM/DD HH:MM:SS Successfully connected to Firebase Storage for bucket: <your_firebase_bucket>
YYYY/MM/DD HH:MM:SS Starting WebSocket server on ws://localhost:8080/ws/chat-audio

API Endpoints

WebSocket Endpoint: ws://localhost:<PORT>/ws/chat-audio

Default: ws://localhost:8080/ws/chat-audio

Metrics Endpoint: http://localhost:<PORT>/api/metrics

Default: http://localhost:8080/api/metrics

This endpoint provides real-time system metrics in JSON format including:

Active and total connections
CPU and memory usage
Goroutine count
Uptime and other runtime statistics

History Endpoint: http://localhost:<PORT>/api/history?sessionId=<SESSION_ID>

Default: http://localhost:8080/api/history?sessionId=<SESSION_ID>

Retrieves the conversation history for a specific session, including:

Message ID
Role (user or assistant)
Content (decrypted)
Creation timestamp
Feedback (if provided)

Sessions Endpoint: http://localhost:<PORT>/api/sessions?userId=<USER_ID>

Default: http://localhost:8080/api/sessions?userId=<USER_ID>

Retrieves a list of all sessions for a specific user ID, including:

Session ID
User ID
Creation timestamp
Last activity timestamp
Model ID and Voice ID

Feedback Endpoint: http://localhost:<PORT>/api/feedback

Default: http://localhost:8080/api/feedback

Accepts POST requests with JSON payload to update feedback for a specific message:

{
  "message_id": 123,
  "feedback": "positive" // or "negative"
}

Status Dashboard: http://localhost:<PORT>/status

Default: http://localhost:8080/status

A simple web interface that displays system metrics in a user-friendly format with auto-refresh.

Database Integration

The application integrates with PostgreSQL for persistent storage of conversation history:

Key Features

Asynchronous Operations: All database operations (reads and writes) are implemented using goroutines to ensure they don't block the main conversation flow, preserving low latency in the real-time audio streaming experience.
Connection Pooling: The database connection pool is configured with optimized settings (25 max open connections, 25 max idle connections, 5-minute connection lifetime) to handle concurrent requests efficiently.
Schema Design:
- sessions table: Stores session metadata including:
  - session_id - Unique identifier for the session
  - user_id - Client's user identifier
  - created_at - When the session was created
  - last_activity_at - Timestamp of the most recent activity
  - model_id - The OpenAI model used for this session
  - voice_id - The ElevenLabs voice used for this session
- messages table: Stores conversation messages including:
  - id - Unique message identifier
  - session_id - Reference to the parent session
  - role - Message role ('user' or 'assistant')
  - content - Encrypted message content
  - audio_url - Optional URL to audio file (if stored)
  - feedback - Optional user feedback on message ('positive' or 'negative')
  - created_at - When the message was created
Conversation Persistence: Messages from both user and assistant are stored in the database as they occur, enabling:
- History retrieval across sessions
- Conversation continuity even after disconnections
- Potential for analytics and user experience improvements
- Feedback collection for message quality assessment
Optional Integration: The database integration is optional - if no database connection string is provided (PG_DB_URL), the application will function with in-memory conversation history only.
Selective Encryption: Only the message content is encrypted before storage to enhance data security, while keeping other fields searchable.

Implementation Details

Messages are stored asynchronously in separate goroutines to prevent blocking the main conversation flow
Database indexes are created automatically for optimized query performance on session_id and user_id
Error handling is robust with detailed logging that doesn't interrupt the user experience
Feedback can be provided on messages via the /api/feedback endpoint

Audio Recording and Storage

In addition to real-time streaming, the application also includes an audio recording and storage feature that:

Concatenates Audio Chunks: As audio is streamed through the system, both user speech and AI-generated responses are concatenated in memory buffers.
- userAudioBuffer stores raw audio data from the client's microphone
- assistantAudioBuffer stores the synthesized audio from ElevenLabs
Firebase Integration: Completed audio recordings are automatically uploaded to Firebase Storage:
- User audio is converted from raw PCM to properly formatted WAV files
- Assistant audio is stored in the format specified by ELEVENLABS_OUTPUT_FORMAT (default: MP3)
- Files are organized in the storage bucket using a hierarchical structure: audio_recordings/{user_id}/{session_id}/{filename}
- Each filename includes a UUID and timestamp to ensure uniqueness
Audio Duration Calculation and Metadata: Enhanced audio storage with comprehensive metadata:
- Automatic Duration Detection: Calculates audio duration for multiple formats including WAV, MP3, and FLAC
- Format-Specific Parsing: Implements dedicated parsers for different audio formats:
  - WAV: Full implementation with RIFF header and chunk parsing
  - MP3: Complete MPEG frame analysis with support for MPEG 1/2/2.5 and bitrate detection
  - FLAC: STREAMINFO block parsing for sample rate and total samples
  - WebM/OGG/AAC: Framework in place for future implementation
- Smart Format Detection: Automatically detects audio format by file signature when content type is unclear
- Rich Metadata Storage: Each uploaded file includes comprehensive metadata:
  - Duration in seconds (calculated automatically)
  - Upload timestamp
  - File size in bytes
  - User ID and session ID for organization
  - Content type and format information
- Metadata Retrieval Functions: API functions for retrieving audio metadata and duration information
- Session Audio Listing: Ability to list all audio files for a session with their durations and metadata
Database References: After successful upload, the audio file URLs are encrypted and stored in the database:
- The audio_url field in the messages table is updated with the encrypted Firebase Storage URL
- URLs are retrieved and decrypted when conversation history is requested
Asynchronous Processing: All audio processing and uploading operations happen asynchronously:
- Audio uploads occur in the background after message processing completes
- Database updates happen non-blocking to maintain low latency in the primary conversation flow
- Audio processing continues even if a client disconnects, ensuring complete conversation archiving
Configuration Options:
- FIREBASE_SERVICE_ACCOUNT_KEY_PATH: Path to your Firebase service account credentials JSON file (default: "serviceAccountKey.json")
- FIREBASE_STORAGE_BUCKET: Your Firebase Storage bucket name

This feature enables:

Complete conversation archiving with both text and audio
Playback of previous conversations with accurate duration information
Training data collection for AI improvement with detailed audio metadata
Quality assurance and user experience analysis with comprehensive audio metrics
Session analytics including total conversation duration and audio file organization
Efficient audio file management with metadata-based search and filtering

The audio storage system is designed to be lightweight on the main processing thread, maintaining the application's focus on low-latency real-time communication while providing comprehensive conversation history.

Client Interaction (`index.html`)

The provided index.html serves as an example client for this Speech-to-Speech system.

Connect: The client establishes a WebSocket connection to the endpoint upon page load.
Send Audio (Recording):
- The user clicks "Start Recording".
- The browser captures audio from the microphone (typically at 16kHz, 16-bit PCM as configured).
- This audio data is sent as binary WebSocket messages to the server.
- When the user clicks "Stop Recording", a JSON message {"type": "closeMicrophone"} is sent to signal the end of audio input.
Receive Status and Transcription Updates:
- The server may send JSON text messages to update the client on the status (e.g., "Transcription complete. AI is processing...", "AI response finished.").
- Interim transcription updates from Deepgram might also be relayed.
Receive AI's Spoken Audio:
- Once the AI generates text and ElevenLabs synthesizes speech, the server streams back binary WebSocket messages. Each message contains a chunk of audio data (e.g., MP3).
- The index.html client uses Media Source Extensions (MSE) to play this streamed audio in real-time.
Receive Errors (Optional): If an error occurs on the server-side, the server may send a JSON text message:
```
{
    "error": "Description of the error"
}
```
Connection Close: The server closes the WebSocket connection if an unrecoverable error occurs. The client also handles connection closures.

To use the example HTML client:

Ensure the Go backend server is running and configured with your API keys.
Open index.html (located in the project root) in a modern web browser (Chrome, Firefox, Edge recommended).
Allow microphone access when prompted.
Click "Start Recording", speak, and then click "Stop Recording".

Logging

The server logs various stages of processing to the console, including:

Client connections and disconnections
Deepgram connection status and transcription events (like UtteranceEnd)
Transcribed text being sent to the LLM
Sentences produced by the LLM
Requests to ElevenLabs for TTS
Audio streaming events for the response
Errors encountered during any stage

Check the server console output for these logs. The client-side JavaScript in index.html also logs extensively to the browser's developer console.

Code Overview

main.go

Initializes configuration, API clients (Deepgram, OpenAI, ElevenLabs), and starts the HTTP server with the WebSocket handler.

config/config.go

Loads application configuration from environment variables with defaults. This includes API keys, model IDs, service URLs for all three services (Deepgram, OpenAI, ElevenLabs), and the PostgreSQL database connection string.

database/db.go

Manages the PostgreSQL database connection and operations:

Initializes the database connection pool with optimized settings
Creates the schema if it doesn't exist (sessions and messages tables)
Provides functions for storing and retrieving messages
Implements session management functionality
All database operations are designed to be non-blocking

handlers/init.go

Contains the InitializeClients function. It initializes the Deepgram SDK, the openai.Client, and the custom http.Client for ElevenLabs.

handlers/websocket_handler.go

This is the core of the server-side WebSocket logic:

Upgrades HTTP requests to WebSocket connections.
Speech-to-Text (Deepgram):
- Initializes a Deepgram live transcription client for each WebSocket connection.
- Implements a DeepgramCallbackHandler to process events from Deepgram (e.g., Open, Message, UtteranceEnd, Error).
- Forwards binary audio data received from the client to Deepgram.
- Accumulates the transcript from Deepgram.
- Uses Deepgram's UtteranceEnd event to determine when the user has paused speaking.
LLM Processing (OpenAI-compatible):
- Once a complete utterance is transcribed, sends the text to services.StreamOpenAIText.
Text-to-Speech (ElevenLabs):
- Pipes sentences from the LLM response to services.StreamTTSWebSocket for audio synthesis.
Manages concurrent goroutines for all streaming operations.
Handles context cancellation for graceful shutdown if the client disconnects or an error occurs.
Streams binary audio data (the AI's speech) back to the client.
Sends JSON status and error messages to the client.

services/openai_service.go

StreamOpenAIText: Connects to the LLM, sends the prompt (transcribed text), and streams back the text response. It parses the incoming text into sentences and sends each to an output channel.

services/elevenlabs_service.go

StreamTTSWebSocket: Takes a sentence of text, makes a POST request to the ElevenLabs API, and streams the resulting audio chunks via a callback, suitable for WebSocket transmission.

utils/http_client.go

NewElevenLabsClient: Creates a configured http.Client for low-latency communication with ElevenLabs (TCP_NODELAY, HTTP/2).

utils/json_utils.go

PrettyJSON: A helper for indenting JSON for logging.

gcp_services/firebase_storage.go

Manages Firebase Storage interactions for audio recording persistence with advanced metadata and duration calculation:

InitializeFirebase: Sets up the Firebase app and storage client using the service account credentials
UploadAudioToFirebase: Uploads audio data to Firebase Storage with comprehensive metadata including calculated duration
CalculateAudioDuration: Main function that determines audio duration based on content type and format detection
Format-Specific Duration Calculators:
- calculateWAVDuration: Parses WAV RIFF headers and data chunks for precise duration calculation
- calculateMP3Duration: Advanced MP3 frame parsing with MPEG version detection and bitrate analysis
- calculateFLACDuration: FLAC STREAMINFO block parsing for sample rate and total samples
- detectAndCalculateDuration: Smart format detection using file signatures when content type is unclear
Metadata Management Functions:
- GetAudioMetadata: Retrieves comprehensive metadata for uploaded audio files
- GetAudioDuration: Convenience function to get duration from stored metadata
- ListAudioFilesWithDuration: Lists all audio files for a session with duration and metadata
CreateWavFromPCM: Converts raw PCM audio data to WAV format by adding the appropriate header
Helper functions for determining content types, file extensions, and supported audio formats

Potential Improvements and Considerations

Client-Side Resampling: The index.html has basic audio resampling. For production, if the browser's audio capture rate doesn't match Deepgram's expected rate (e.g., 16kHz), a more robust client-side resampling library would improve transcription accuracy.
AudioWorklets: For client-side audio processing, AudioWorklet is more modern and performant than ScriptProcessorNode and should be considered for production applications.
Transcription Accuracy & Model Choice: Experiment with different Deepgram models (nova-2, nova-3, etc.) and settings (e.g., interim_results, endpointing) for optimal transcription.
More Sophisticated Sentence Tokenization: The current LLM response sentence splitting is basic.
Buffer Management (MSE Client): The example MSE client has basic buffer handling for playback.
Authentication/Authorization: Implement proper authentication for clients.
Rate Limiting: Protect backend services (Deepgram, LLM, TTS) by implementing rate limiting.
Database Enhancements:
- Implement a connection retry mechanism for database operations
- Add a caching layer to reduce database load for frequently accessed conversations
- Support for database sharding for high-volume deployments
- Create advanced analytics tools for conversation and feedback data analysis
- Implement database schema migrations for version control
Scalability: For high-volume traffic, consider load balancing and horizontal scaling.
Detailed Metrics & Observability: Integrate metrics for latency at each S2S stage.
Configuration Management: Use a robust configuration system for production.
Backpressure: Implement more sophisticated backpressure mechanisms if needed.
Noise Reduction/Echo Cancellation: Explore advanced options if client-side audio quality is an issue. The current index.html requests browser-based echo cancellation and noise suppression.

Monitoring and Metrics

The application provides a metrics endpoint and status dashboard for monitoring the system's health and performance:

Conversation Memory Implementation

The conversation memory feature maintains context throughout a WebSocket session, allowing the AI to respond more coherently to multi-turn conversations:

Per-Session Memory: Each WebSocket connection maintains its own conversation history.
Message Structure: Conversation is stored as a sequence of role-based messages:
- systemMessage: The initial system prompt that defines the AI's behavior
- userMessage: Transcribed speech from the client
- assistantMessage: Responses generated by the LLM
Implementation Details:
- Messages are stored in the clientState struct for each connection
- The conversation history begins with the system prompt when a client connects
- User messages are added to history as soon as transcription is complete
- Assistant responses are captured in full and added to history after processing
- The entire conversation context is sent to the LLM with each new user message
Thread Safety: Proper locking mechanisms ensure thread-safe updates to conversation history
Efficient Processing: The design maintains the original real-time processing flow while adding memory capabilities in parallel

Metrics Endpoint

The /api/metrics endpoint returns a JSON object with the following information:

Connection statistics (active and total connections)
System resource usage (CPU, memory)
Go runtime metrics (goroutines, memory allocation)
Server uptime and status

Example request:

curl http://localhost:8080/api/metrics

Example response:

{
  "timestamp": "2023-07-01T12:34:56Z",
  "active_connections": 2,
  "total_connections": 15,
  "goroutines": 12,
  "allocated_memory_bytes": 2097152,
  "total_allocated_memory_bytes": 10485760,
  "system_memory_usage_percent": 45.7,
  "heap_objects": 8724,
  "cpu_usage_percent": 2.5,
  "system_cpu_usage_percent": 32.1,
  "process_memory_bytes": 15728640,
  "uptime_seconds": 3600,
  "last_collection_time": 1688216096
}

Status Dashboard

The /status endpoint provides a simple web interface for viewing system metrics in a user-friendly format. It automatically refreshes every 5 seconds and displays:

Current active connections and total connections since startup
CPU and memory usage (system and process)
Go runtime metrics (goroutines, memory allocation)
Server uptime

This dashboard is useful for quick visual monitoring of the system's health without using additional monitoring tools.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
config		config
database		database
gcp_services		gcp_services
handlers		handlers
metrics		metrics
public		public
services		services
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

Real-time Speech-to-Speech Streaming Backend (Deepgram + OpenAI + ElevenLabs)

Features

Project Structure

Prerequisites

Setup and Configuration

Running the Server

API Endpoints

Database Integration

Key Features

Implementation Details

Audio Recording and Storage

Client Interaction (index.html)

Logging

Code Overview

main.go

config/config.go

database/db.go

handlers/init.go

handlers/websocket_handler.go

services/openai_service.go

services/elevenlabs_service.go

utils/http_client.go

utils/json_utils.go

gcp_services/firebase_storage.go

Potential Improvements and Considerations

Monitoring and Metrics

Conversation Memory Implementation

Metrics Endpoint

Status Dashboard

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Client Interaction (`index.html`)

Packages