Newer
Older
cortex-hub / docs / architecture / cortex_agent_node_plan.md

Cortex Agent Node: Architecture & Implementation Plan

This document outlines the transition from the current WebSockets (wss) code syncing approach to a fully distributed, secure, multi-agent architecture where the Cortex Server orchestrates powerful local "Agent Nodes."

🏗️ High-Level Architecture

1. The Cortex Server (Orchestrator)

  • Role: The central brain. Handles AI inference, task planning, and user interface.
  • Communication Hub: Exposes a bidirectional streaming endpoint via gRPC over HTTP/2 to securely manage connections. gRPC provides native bidirectional streaming, built-in schema enforcement via Protobuf, first-class retry semantics, and stronger backpressure handling compared to WebSockets.
  • Node Registry: Keeps track of connected nodes, their identities, health status, and Capability Discovery. The server treats Agent Nodes as heterogeneous and relies on a capability manifest sent by the node.

2. The Agent Node (Client Software)

  • Role: A lightweight, standalone daemon running on the user's local machine or CI runners.
  • Execution Engine: Receives tasks from the server, executes them locally via an isolated execution context, and streams results back.
  • Capabilities:
    • System Ops: Run bash commands, edit files, list directories within a strict sandbox.
    • Browser Automation: Control local browsers via CDP (Chrome DevTools Protocol) for UI testing, allowing chunked binary frames, headless/visible toggling, and DOM snapshot streaming.
    • Auditing & Observability: Maintains a strict, immutable local log of every command. Emits task execution timing, failure counters, and crash telemetry.

3. Tunneling & Security

  • The "Phone Home" Pattern: To bypass NAT and firewalls, the Agent Node initiates an outbound HTTPS/HTTP2 connection (Outbound 443) to the server. The server then pushes tasks down this persistent bidirectional stream.
  • Security Stack:
    • mTLS: Validates node identity and establishes enterprise-grade encryption.
    • Short-lived JWT: Provides session authentication and fine-grained authorization (Capability claims).
    • Task Signatures: Tasks from the server should be signed to prevent injection, allowing the Node to validate task signatures, issuers, expiry, and user binding.

🛠️ Execution Plan

We will execute this transformation in 6 phased milestones.

Phase 1: Protocol & Tunnel Proof of Concept (POC) - ✅ COMPLETE

  • Status: Verified in /app/poc-grpc-agent/.
  • Achievements:
    • Defined agent.proto with bidirectional streaming.
    • Implemented Python server.py and client.py.
    • Successfully demonstrated registration, heartbeat pattern, and task-dispatch with remote shell execution.
    • Validated multiplexing and backpressure via gRPC.
  • Outcome: Server can dispatch an idempotent "Echo" task down the gRPC stream.

Phase 2: Security, Identity & Observability

  • Goal: Lock down the tunnel and introduce tracing.
  • Tasks:
    • Implement the Security Stack (mTLS, JWT, Task Signatures).
    • Require the Agent Client to authenticate and map connections to a User/Workspace.
    • Observability: Add per-task tracing IDs, structured logs on the server side, and OpenTelemetry for node crash reports and execution timing.
  • Outcome: Only authenticated, signed tasks run, with full tracing across the distributed system.

Phase 3: Core Capabilities & Secure Engine (The Local Sandbox)

  • Goal: Safely execute host commands and establish audit logs.
  • Tasks:
    • Capability Negotiation: Agent sends a JSON manifest (version, platform, capabilities) on connection.
    • Command Sandbox Policy: Disallow network access by default, run under non-privileged user, and strictly whitelist allowed commands.
    • Consent-based Execution: Add a "Strict Mode" (manual Y/N prompt for every command) and "Auto-Approve" for non-destructive actions.
    • Advanced Auditing: Implement append-only local logs with periodic hash chaining and optional tamper detection (hash tree).
  • Outcome: Secure, auditable, and consensual execution of system queries.

Phase 4: Browser Automation (The "Antigravity" Feature)

  • Goal: Allow Agent to interact with local web apps.
  • Tasks:
    • Implement CDP (Chrome DevTools Protocol) to attach to existing browsers.
    • Stream screenshots efficiently using chunked binary frames over gRPC, possibly with compression or delta snaps.
    • Include headless/visible toggles, timeboxed navigation, and DOM snapshot streaming.
  • Outcome: High-performance, low-latency visual interaction with local web pages.

Phase 5: Concurrency & Task Isolation

  • Goal: Handle simultaneous requests without state corruption.
  • Tasks:
    • Task Isolation Model: Ensure each task runs in an isolated context. Browser actions get isolated contexts; file writes use advisory locks.
    • Introduce Resource Quotas (limit % CPU/Memory per agent).
  • Outcome: Multiple tasks execute safely and concurrently without race conditions.

Phase 6: Scaling & Frontend UI Integration

  • Goal: Support multiple nodes and surface insights in the UI.
  • Tasks:
    • Scaling: Prepare for multi-node orchestration (e.g., node pools, load-aware dispatch, Redis/NATS as control plane).
    • Update CodingAssistantPage to recognize nodes via the Node Registry.
    • Provide users a UI dashboard for remote audit logs and tracing.
  • Outcome: A seamless user experience managing distributed execution.

Before mapping this into JIRA/GitBucket issues, we should build the gRPC Protobuf Schema (agent.proto) and establish the Phase 1 Dummy Python Server/Client.

Shall I proceed with writing the initial Protobuf definition to solidify the API contract?