Cortex Agent Node: Architecture & Implementation Plan
This document outlines the transition from the current WebSockets (wss) code syncing approach to a fully distributed, secure, multi-agent architecture where the Cortex Server orchestrates powerful local "Agent Nodes."
🏗️ High-Level Architecture
1. The Cortex Server (Orchestrator)
- Role: The central brain. Handles AI inference, task planning, and user interface.
- Communication Hub: Exposes a bidirectional streaming endpoint (gRPC over HTTP/2 or robust WebSockets) to securely manage connections from multiple remote Agent Nodes.
- Node Registry: Keeps track of connected nodes, their identities, capabilities, and health status.
2. The Agent Node (Client Software)
- Role: A lightweight, standalone daemon running on the user's local machine (or specific dev containers).
- Execution Engine: Receives tasks from the server, executes them locally (using host resources), and streams results back.
- Capabilities:
- System Ops: Run bash commands, edit files, list directories.
- Browser Automation: Control local browsers via CDP (Chrome DevTools Protocol) for UI testing and visual feedback.
- Auditing: Maintains a strict, immutable local log of every command executed by the AI, ensuring the user has a transparent trail of data access.
3. Tunneling & Security
- Bidirectional Tunnel: Allows the server to proactively dispatch tasks (like "open a file" or "click this button") rather than waiting for the client to poll.
- JWT Identity & Authz:
- Each Agent Node is bootstrapped with a unique identity (Service Account or User-bound token).
- The node presents a short-lived JWT upon tunnel connection. The server validates the claims to ensure the node is authorized.
- mTLS (Optional but Recommended): For enterprise-grade security, Mutual TLS can be established between the Server and Agent Node to prevent Man-in-the-Middle attacks.
🛠️ Execution Plan
We will execute this transformation in 6 phased milestones.
Phase 1: Protocol & Tunnel Proof of Concept (POC)
- Goal: Establish a reliable, bidirectional, asynchronous connection that supports retries.
- Tasks:
- Define the communication protocol (gRPC streams vs. WebSockets with exact message schemas:
TaskRequest, TaskResponse, Heartbeat).
- Build a dummy Python/Node.js Agent Client that connects to the Cortex backend.
- Implement connection retry logic with exponential backoff.
- Outcome: Server can send a simple "Echo" task to the client, and the client processes it and returns the result.
Phase 2: Security & Identity Implementation
- Goal: Lock down the tunnel.
- Tasks:
- Implement JWT minting for Agent Nodes on the Cortex Server.
- Require the Agent Client to authenticate during the initial handshake.
- Associate connected sessions with a specific User/Workspace identity to enforce authorization boundaries.
- Outcome: Only authenticated nodes can connect; connections are mapped to user sessions.
Phase 3: Core Capabilities & Auditing (The Local Engine)
- Goal: Give the Agent Node hands and eyes.
- Tasks:
- Implement the
ShellTool on the client (safe bash execution with timeouts).
- Implement the
FileSystemTool (read/write/grep).
- Build the Audit Interceptor: Every command requested by the server is logged locally (e.g.,
~/.cortex/audit.log) before execution.
- Outcome: The Server can ask the Client to read
/etc/os-release and get the output back safely.
Phase 4: Browser Automation (The "Antigravity" Feature)
- Goal: Allow the Agent Node to interact with local web apps.
- Tasks:
- Integrate Playwright or CDP connectivity into the Agent Node.
- Create standardized commands like
Navigate, Click, CaptureScreenshot.
- Stream screenshots back over the tunnel as base64 or chunks.
- Outcome: The Server can instruct the client's browser to open localhost:8080 and take a screenshot.
Phase 5: Concurrency & Parallel Execution
- Goal: Handle multiple simultaneous requests safely.
- Tasks:
- Implement asynchronous task workers on the Agent Node.
- Ensure thread-safety for file writes and browser controls.
- Add task cancellation mechanisms.
- Outcome: Server can issue 5 simultaneous file-read operations and they complete concurrently without blocking the tunnel.
Phase 6: Frontend UI Integration & Refactoring
- Goal: Replace the old UI approach with the new system.
- Tasks:
- Update the
CodingAssistantPage to recognize connected Agent Nodes instead of relying on the old WSS sync logic.
- Display connected nodes in the UI.
- Give users a dashboard to view the remote audit logs from the UI.
- Outcome: A seamless user experience powered by the new architecture.
🔬 Next Steps
Before we create GitBucket issues, we should build a minimal implementation of Phase 1 to validate the networking stack (FastAPI WebSockets or gRPC) and ensure it handles bidirectionality well.