Newer
Older
cortex-hub / docs / reviews / feature_review_schemas_lfi.md

Code Review Report: Feature 9 — API Schemas & Data Validation

This report performs a deep-dive audit of the API structure and Pydantic validation layer, focusing on schemas.py and shared core utilities.


🏗️ 12-Factor App Compliance Audit

Factor Status Observation
III. Config Success Schemas are decoupled from environment/config and correctly use Pydantic V2's ConfigDict and model_config.
VII. Port Binding Success The separation of schemas into a clear, standalone schemas.py ensures the API interface remains consistent regardless of how the Hub is bound or proxied.

🔍 File-by-File Diagnostic

1. app/api/schemas.py

The source of truth for all JSON-to-Python object mapping.

[!CAUTION] CRITICAL SECURITY RISK: Local File Inclusion (LFI) Line 562: resolve_prompt_content(self) The AgentTemplateResponse contains a @model_validator(mode='after') that attempts to automatically read files from the local filesystem if system_prompt_path begins with a slash.

The Vulnerability: If an attacker can create an Agent Template or update an existing one with a system_prompt_path like /app/.env or /etc/passwd, the Hub will read the file and return its entire contents in the system_prompt_content field of the API response.

Fix: Immediately remove this validator from the schema. File-reading logic MUST be performed in the Service Layer with explicit path validation/sandboxing (e.g., checking that the path is within a designated prompts/ directory).

Identified Problems:

  • Performance Bottleneck (Blocking I/O): Line 570 performs a blocking f.read() inside a Pydantic validator. Because FastAPI's JSON response serialization is often performed in a way that respects async, this blocking I/O on a large prompt file will stall the event loop for all users during the response cycle.
  • Recursive Payload Hazard: AgentInstanceResponse (Line 594) includes a full Session and AgentTemplateResponse as optional fields. As your agent mesh grows, these recursive lookups in the serializer can lead to "Over-fetching" and significant memory spikes during JSON serialization of list results.

2. app/core/_regex.py

Shared regular expression library.

Identified Problems:

  • No ReDoS Identified: The ANSI_ESCAPE pattern (Line 5) is well-bounded and safe for high-frequency token streaming.

🛠️ Summary Recommendations

  1. Remove Schema-Level File Reading: Move all "Prompt Loading" logic from schemas.py to PromptService and ensure it only accesses paths within a validated sandbox.
  2. Optimize Serializers: Use "Lighthearted" variants of response schemas (e.g., AgentInstanceSummary with IDs only) for list results to avoid recursive database/serializer overhead.
  3. Strict Path Validation: In the PromptService, use os.path.realpath to prevent directory traversal (../../) when resolving prompt file paths.

This concludes Feature 9. I have persisted this report to /app/docs/reviews/. I am ready for the final backend file checks or to assist with fixing the LFI risk.