This report performs a deep-dive audit of the Hub's audio transcription layer, specifically the GoogleSTTProvider in stt/gemini.py, through the lens of 12-Factor App Methodology, Credential Safety, and Memory Efficiency.
| Factor | Status | Observation |
|---|---|---|
| XI. Logs | 🔴 Major Issue | Plain-text API Key Leak: The GoogleSTTProvider (Line 89) logs the self.api_url in every transcription request at the DEBUG level. Because the Google AI Studio api_key is a query parameter in this URL (Line 36), the full production API key is leaked in plain text to the Hub's log aggregators. |
| VI. Processes | 🟡 Warning | Payload Duplication: Transcription requests generate a base64-encoded copy of the entire audio blob in-memory (audio_b64, Line 69). For long-form audio processing (e.g., meeting recordings), this can double the Hub's per-request memory pressure, potentially triggering OOM kills on memory-constrained containers. |
| II. Dependencies | 🟡 Warning | Client Inconsistency: The STT provider uses aiohttp (Line 2), while most other backend services (including the TTS provider) use httpx. This introduces redundant dependencies and disparate connection pooling behavior across the Hub. |
app/core/providers/stt/gemini.pyThe inline transcription bridge for Google's Gemini multimodal models.
[!CAUTION] Lack of Response Throttling/Retry Unlike the TTS provider, the STT provider does not implement a
tenacityretry decorator (Line 94). If a transcription fails due to a transient network timeout or a 429 rate limit from Google, the user's voice message is lost immediately without an automatic retry. Fix: Implement a standard retry policy for 429/5xx errors, consistent with theGeminiTTSProvider.
Identified Problems:
_detect_mime sniffer (Line 41) only checks the first 3-4 bytes. While effective for common formats, it lacks the robustness of a dedicated media library and might misidentify edge-case codec containers."Return only the spoken words, nothing else" (Line 82) is hardcoded. If a user wants to include punctuation or speaker labels (Diarization), this prompt will prevent the model from doing so.self.api_url in the debug log with a masked string or a simple "Sending to Google" label to prevent credential leaks.aiohttp to httpx to unify the Hub's HTTP connection management and reduce the dependency footprint.transcribe_audio in a tenacity retry loop to handle transient API failures without dropping user requests.This concludes Feature 17. I have persisted this report to /app/docs/reviews/feature_review_stt_providers.md. How should we address the API-key logging hazard?