Code Review Report: Feature 17 — Speech-to-Text Infrastructure

This report performs a deep-dive audit of the Hub's audio transcription layer, specifically the GoogleSTTProvider in stt/gemini.py, through the lens of 12-Factor App Methodology, Credential Safety, and Memory Efficiency.

🏗️ 12-Factor App Compliance Audit

Factor	Status	Observation
XI. Logs	🔴 Major Issue	Plain-text API Key Leak: The `GoogleSTTProvider` (Line 89) logs the `self.api_url` in every transcription request at the `DEBUG` level. Because the Google AI Studio `api_key` is a query parameter in this URL (Line 36), the full production API key is leaked in plain text to the Hub's log aggregators.
VI. Processes	🟡 Warning	Payload Duplication: Transcription requests generate a base64-encoded copy of the entire audio blob in-memory (`audio_b64`, Line 69). For long-form audio processing (e.g., meeting recordings), this can double the Hub's per-request memory pressure, potentially triggering OOM kills on memory-constrained containers.
II. Dependencies	🟡 Warning	Client Inconsistency: The STT provider uses `aiohttp` (Line 2), while most other backend services (including the TTS provider) use `httpx`. This introduces redundant dependencies and disparate connection pooling behavior across the Hub.

🔍 File-by-File Diagnostic

1. `app/core/providers/stt/gemini.py`

The inline transcription bridge for Google's Gemini multimodal models.

[!CAUTION] Lack of Response Throttling/Retry Unlike the TTS provider, the STT provider does not implement a tenacity retry decorator (Line 94). If a transcription fails due to a transient network timeout or a 429 rate limit from Google, the user's voice message is lost immediately without an automatic retry. Fix: Implement a standard retry policy for 429/5xx errors, consistent with the GeminiTTSProvider.

Identified Problems:

Brittle MIME Detection: The _detect_mime sniffer (Line 41) only checks the first 3-4 bytes. While effective for common formats, it lacks the robustness of a dedicated media library and might misidentify edge-case codec containers.
Static System Prompt: The instruction "Return only the spoken words, nothing else" (Line 82) is hardcoded. If a user wants to include punctuation or speaker labels (Diarization), this prompt will prevent the model from doing so.

🛠️ Summary Recommendations

Redact Logger URL: Replace the self.api_url in the debug log with a masked string or a simple "Sending to Google" label to prevent credential leaks.
Standardize Client: Migrate from aiohttp to httpx to unify the Hub's HTTP connection management and reduce the dependency footprint.
Add Resilience: Wrap transcribe_audio in a tenacity retry loop to handle transient API failures without dropping user requests.

This concludes Feature 17. I have persisted this report to /app/docs/reviews/feature_review_stt_providers.md. How should we address the API-key logging hazard?

Code Review Report: Feature 17 — Speech-to-Text Infrastructure

🏗️ 12-Factor App Compliance Audit

🔍 File-by-File Diagnostic

1. app/core/providers/stt/gemini.py

🛠️ Summary Recommendations

1. `app/core/providers/stt/gemini.py`