Code Review Report: Feature 7 & 8 — RAG Infrastructure & Voice Services

This report performs a deep-dive audit of the Vector Search and Audio Synthesis layers, focusing on faiss_store.py and gemini.py (TTS) through the lens of 12-Factor App Methodology, Pythonic Code Style, and Concurrency Safety.

🏗️ 12-Factor App Compliance Audit

Factor	Status	Observation
VI. Processes	🔴 Major Issue	Simultaneous Write Hazard (FAISS): The `FaissVectorStore` (Lines 69, 106) performs a synchronous `self.save_index()` on every document ingestion. Because `faiss.write_index` performs a full file overwrite, two concurrent RAG sessions adding documents simultaneously will enter a race condition, leading to permanent FAISS index corruption. This index should be managed by a singleton manager or write-ahead-logging (WAL) pattern.
XI. Logs	🔴 Security Warning	Credential Leak Potential: The `GeminiTTSProvider` (Line 74) logs its API endpoint URL. For AI Studio keys, the `api_key` is part of the URL. While currently truncated for debugging, any change in log level or endpoint format risks exposing production API keys in plain-text logs.
IX. Disposability	🟡 Warning	In-Memory Audio Bloat: `generate_speech` accumulates the entire audio result in-memory (`b"".join(audio_fragments)`) before returning. For long-form text synthesis, this can cause significant Hub memory pressure and long "Time-To-First-Byte" (TTFB) for the UI.

🔍 File-by-File Diagnostic

1. `app/core/vector_store/faiss_store.py`

The local-first vector search engine using FAISS and SQLAlchemy.

Identified Problems:

Stale ID Map: initialize_index (Line 25) syncs with the DB on startup, but there's no mechanism to handle out-of-sync states if the DB is rolled back but the FAISS file is already written.
Search Inefficiency: search_similar_documents performs a three-stage query (FAISS search $\rightarrow$ Filter $\rightarrow$ ID Lookup). This introduces unnecessary overhead for small result sets.

2. `app/core/providers/tts/gemini.py`

The Google Gemini/Vertex AI audio synthesis provider.

[!CAUTION] Lack of Stream Consumption Support The provider is structured as an "All-or-Nothing" buffer (Line 151). This prevents streaming playback on the frontend, which is the standard for modern "agentic" voice interactions. Fix: Update the generate_speech method to be an async generator that yields audio chunks as they arrive from the Google stream.

Identified Problems:

Vertex Region Lock: The Vertex endpoint is hardcoded to us-central1 (Line 58). This violates the requirement for data residency and configurable regions.
Magic Number Model Name: Line 42 hardcodes a preview model name ("gemini-2.5-flash-preview-tts"). If this model is deprecated by Google, the Voice feature will break for all users until a code change is deployed.

🛠️ Summary Recommendations

Harden FAISS Synchronization: Implement a lock or specialized "Vector Writer" task to serialize save_index calls and prevent index corruption during concurrent ingestion.
Sanitize Logging: Remove API URLs from standard debug logs in the TTS/STT providers. Use masked/redacted strings for sensitive metadata.
Implement Streaming TTS: Refactor the TTS interface to support chunked delivery, reducing TTFB and Hub memory usage.

This concludes Feature 7 & 8. I have persisted these reports to /app/docs/reviews/. I am ready for your next request.