Newer
Older
cortex-hub / docs / reviews / feature_review_orchestration.md

Code Review Report: Feature 1 — Autonomous Rework & Evaluation Hub

This report performs a deep-dive audit of the backend orchestration layer, focusing on agent_loop.py, harness_evaluator.py, and architect.py through the lens of 12-Factor App Methodology, Pythonic Code Style, and Architectural Stability.


🏗️ 12-Factor App Compliance Audit

Factor Status Observation
III. Config 🟡 Warning While using Pydantic Settings, agent_loop.py contains some hardcoded fallbacks and heuristic parsing of provider strings (e.g., split("/")). This logic should be encapsulated in a ConfigurationResolver service.
VI. Processes 🔴 Problem AgentExecutor runs as a long-lived background task. If the Hub process restarts, there is no state persistence/recovery for the "active" loop. It relies on the database status, but the asyncio task is lost.
XI. Logs 🔴 Major Issue Evaluation "logs" (history.log) are treated as application state stored as files on remote mesh nodes. This violates the "Logs as Event Streams" principle and introduces significant latency/corruption risks.
XII. Admin Success Database migrations and node provisioning are handled as one-off processes, maintaining a clean runtime environment.

🔍 File-by-File Diagnostic

1. app/core/orchestration/agent_loop.py

The "brain" of the autonomous rework cycle.

[!WARNING] Single Responsibility Principle (SRP) Violation This file is currently 25KB and performs DB management, LLM provider instantiation, Token Usage regex counting, AND the rework state machine. Fix: Extract TokenUsageCounter, ReWorkStrategy, and LLMProviderFactory into standalone services.

Identified Problems:

  • Legacy Logging: Still uses several print() statements for exception reporting (Lines 328, 377). This makes production monitoring difficult via ELK/CloudWatch.
  • Manual Token Counting: Uses regex/manual subtraction instead of relying on provider-level usage metadata where possible.
  • Database Contention: Performs frequent instance = db.query(...).first() calls inside the streaming loop. This creates high DB pressure during high-throughput reasoning tasks.

2. app/core/orchestration/harness_evaluator.py

Manages quality gates and rubric generation.

[!CAUTION] Catastrophic Failure Potential: Persistent Audit Store Lines 313-343 use a cat $\rightarrow$ json.loads $\rightarrow$ write pattern to manage rework history.

  1. Concurrency: If multiple processes write to history, data is lost.
  2. Scalability: If the log grows to several megabytes, every audit attempt will timeout due to gRPC payload limits for essentially a "database" operation executed via shell commands. Fix: Store evaluation history in the primary SQL database or an append-only JSONL stream.

Identified Problems:

  • Hardcoded Prompts: System prompts for Rubric and Delta Analysis are hardcoded in the class (Lines 98, 176). This makes prompt engineering difficult without code deploys.
  • Error Handling: initialize_cortex ignores failures. If a node has a read-only filesystem, the agent will appear to "hang" silently.

3. app/core/orchestration/architect.py

The tool-calling execution engine.

Identified Problems:

  • Memory Management: The context-chunk injection logic doesn't strictly enforce model context limits before dispatching the final turn, which could lead to context_window_exceeded errors for smaller models.
  • Tool Filtering: The tool selection logic is repeated in several places. Consistency is a concern if a new "Guardrail" tool is added.

🛠️ Summary Recommendations

  1. Stop using history.log as primary state: Move rework attempt records to a Database table (AgentAttempt). The file-based log should be a secondary "mirror" only.
  2. Modularize AgentExecutor: Break down the massive run method into distinct phases: PreConfig, Execute, Audit, Iterate.
  3. Upgrade Logging: Replace all print() and logger.error(f"error: {e}") with logger.exception("Context-rich error message") to capture stack traces.

Please review this first feature audit. If approved, I will proceed to the next feature: Node Registry & Mesh Monitoring.