Newer
Older
cortex-hub / docs / reviews / feature_review_vector_voice.md

Code Review Report: Feature 7 & 8 — RAG Infrastructure & Voice Services

This report performs a deep-dive audit of the Vector Search and Audio Synthesis layers, focusing on faiss_store.py and gemini.py (TTS) through the lens of 12-Factor App Methodology, Pythonic Code Style, and Concurrency Safety.


🏗️ 12-Factor App Compliance Audit

Factor Status Observation
VI. Processes 🔴 Major Issue Simultaneous Write Hazard (FAISS): The FaissVectorStore (Lines 69, 106) performs a synchronous self.save_index() on every document ingestion. Because faiss.write_index performs a full file overwrite, two concurrent RAG sessions adding documents simultaneously will enter a race condition, leading to permanent FAISS index corruption. This index should be managed by a singleton manager or write-ahead-logging (WAL) pattern.
XI. Logs 🔴 Security Warning Credential Leak Potential: The GeminiTTSProvider (Line 74) logs its API endpoint URL. For AI Studio keys, the api_key is part of the URL. While currently truncated for debugging, any change in log level or endpoint format risks exposing production API keys in plain-text logs.
IX. Disposability 🟡 Warning In-Memory Audio Bloat: generate_speech accumulates the entire audio result in-memory (b"".join(audio_fragments)) before returning. For long-form text synthesis, this can cause significant Hub memory pressure and long "Time-To-First-Byte" (TTFB) for the UI.

🔍 File-by-File Diagnostic

1. app/core/vector_store/faiss_store.py

The local-first vector search engine using FAISS and SQLAlchemy.

Identified Problems:

  • Stale ID Map: initialize_index (Line 25) syncs with the DB on startup, but there's no mechanism to handle out-of-sync states if the DB is rolled back but the FAISS file is already written.
  • Search Inefficiency: search_similar_documents performs a three-stage query (FAISS search $\rightarrow$ Filter $\rightarrow$ ID Lookup). This introduces unnecessary overhead for small result sets.

2. app/core/providers/tts/gemini.py

The Google Gemini/Vertex AI audio synthesis provider.

[!CAUTION] Lack of Stream Consumption Support The provider is structured as an "All-or-Nothing" buffer (Line 151). This prevents streaming playback on the frontend, which is the standard for modern "agentic" voice interactions. Fix: Update the generate_speech method to be an async generator that yields audio chunks as they arrive from the Google stream.

Identified Problems:

  • Vertex Region Lock: The Vertex endpoint is hardcoded to us-central1 (Line 58). This violates the requirement for data residency and configurable regions.
  • Magic Number Model Name: Line 42 hardcodes a preview model name ("gemini-2.5-flash-preview-tts"). If this model is deprecated by Google, the Voice feature will break for all users until a code change is deployed.

🛠️ Summary Recommendations

  1. Harden FAISS Synchronization: Implement a lock or specialized "Vector Writer" task to serialize save_index calls and prevent index corruption during concurrent ingestion.
  2. Sanitize Logging: Remove API URLs from standard debug logs in the TTS/STT providers. Use masked/redacted strings for sensitive metadata.
  3. Implement Streaming TTS: Refactor the TTS interface to support chunked delivery, reducing TTFB and Hub memory usage.

This concludes Feature 7 & 8. I have persisted these reports to /app/docs/reviews/. I am ready for your next request.