Newer
Older
cortex-hub / docs / architecture / agent_node_feature_gap_analysis.md

🔍 Cortex Agent Node: Feature Gap Analysis & Roadmap

This document outlines the critical missing features required to transition the current gRPC Proof of Concept (PoC) into a full-scale, production-ready Distributed AI Agent System.

1. 🗄️ Workspace & File Synchronization

The current node executes commands but lacks a native way to manage project-level files.

  • The Gap: No bi-directional sync (e.g., local server files -> node workspace).
  • Required: A content-addressable synchronization layer (Merkle Tree / Hash-based) to efficiently mirror workspaces to remote nodes without redundant transfers.

2. 🌊 Real-time Log Streaming (Observability)

Currently, stdout/stderr is only returned upon task completion.

  • The Gap: No visibility into long-running tasks or hanging builds.
  • Required: Implementing gRPC Server-to-Client streaming for live console logs, allowing the Main AI to detect progress or failures as they occur.

3. 🛡️ Robust Sandbox Isolation

The current sandbox relies on string-filtering shell commands.

  • The Gap: Vulnerable to complex shell escapes, symlink attacks, and environment manipulation.
  • Required: OS-level containerization (Docker, Podman, or Firecracker microVMs) to ensure each task is strictly trapped within its own namespace.

4. 🔗 Specialized Sub-Worker Protocols (CDP/LSP)

The agent treats browser automation and coding as generic shell commands.

  • The Gap: Inefficiency; starting a fresh browser for every click is slow and loses state.
  • Required: Persistent sub-bridges (e.g., Chrome DevTools Protocol link) allowing the Main AI to maintain a long-running session across multiple delegated tasks.

5. 📦 Binary Artifact & Large Data Handling

The system currently lacks logic for large file transport.

  • The Gap: gRPC message limits (4MB) will crash the system if a node tries to return a video capture or large log file.
  • Required: Chunked file upload/download logic for artifacts like screenshots, videos, and build binaries.

🏗️ Node Lifecycle & Quality of Life

  • Automatic Updates: Mechanism for nodes to self-update their binary/logic when the central protocol evolves.
  • Graceful Shutdown: Handling system signals to allow background workers to finish or clean up before disconnection.
  • Local Cache: Persistence for task history and metadata on the node to handle temporary network partitions.

[!NOTE] These features bridge the gap between "Command Execution" and "Full Autonomous Collaboration."