# Code Review Report: Feature 17 — Speech-to-Text Infrastructure

This report performs a deep-dive audit of the Hub's audio transcription layer, specifically the `GoogleSTTProvider` in `stt/gemini.py`, through the lens of **12-Factor App Methodology**, **Credential Safety**, and **Memory Efficiency**.

---

## 🏗️ 12-Factor App Compliance Audit

| Factor | Status | Observation |
| :--- | :--- | :--- |
| **XI. Logs** | 🔴 **Major Issue** | **Plain-text API Key Leak**: The `GoogleSTTProvider` (Line 89) logs the `self.api_url` in every transcription request at the `DEBUG` level. Because the Google AI Studio `api_key` is a query parameter in this URL (Line 36), the full production API key is leaked in plain text to the Hub's log aggregators. |
| **VI. Processes** | 🟡 **Warning** | **Payload Duplication**: Transcription requests generate a base64-encoded copy of the entire audio blob in-memory (`audio_b64`, Line 69). For long-form audio processing (e.g., meeting recordings), this can double the Hub's per-request memory pressure, potentially triggering OOM kills on memory-constrained containers. |
| **II. Dependencies** | 🟡 **Warning** | **Client Inconsistency**: The STT provider uses `aiohttp` (Line 2), while most other backend services (including the TTS provider) use `httpx`. This introduces redundant dependencies and disparate connection pooling behavior across the Hub. |

---

## 🔍 File-by-File Diagnostic

### 1. `app/core/providers/stt/gemini.py`
The inline transcription bridge for Google's Gemini multimodal models.

> [!CAUTION]
> **Lack of Response Throttling/Retry**
> Unlike the TTS provider, the STT provider does not implement a `tenacity` retry decorator (Line 94). If a transcription fails due to a transient network timeout or a 429 rate limit from Google, the user's voice message is lost immediately without an automatic retry.
> **Fix**: Implement a standard retry policy for 429/5xx errors, consistent with the `GeminiTTSProvider`.

**Identified Problems**:
*   **Brittle MIME Detection**: The `_detect_mime` sniffer (Line 41) only checks the first 3-4 bytes. While effective for common formats, it lacks the robustness of a dedicated media library and might misidentify edge-case codec containers.
*   **Static System Prompt**: The instruction `"Return only the spoken words, nothing else"` (Line 82) is hardcoded. If a user wants to include punctuation or speaker labels (Diarization), this prompt will prevent the model from doing so.

---

## 🛠️ Summary Recommendations

1.  **Redact Logger URL**: Replace the `self.api_url` in the debug log with a masked string or a simple "Sending to Google" label to prevent credential leaks.
2.  **Standardize Client**: Migrate from `aiohttp` to `httpx` to unify the Hub's HTTP connection management and reduce the dependency footprint.
3.  **Add Resilience**: Wrap `transcribe_audio` in a `tenacity` retry loop to handle transient API failures without dropping user requests.

---

**This concludes Feature 17. I have persisted this report to `/app/docs/reviews/feature_review_stt_providers.md`. How should we address the API-key logging hazard?**
