diff --git a/docs/architecture/cortex_agent_node_plan.md b/docs/architecture/cortex_agent_node_plan.md new file mode 100644 index 0000000..a9e76e7 --- /dev/null +++ b/docs/architecture/cortex_agent_node_plan.md @@ -0,0 +1,84 @@ +# Cortex Agent Node: Architecture & Implementation Plan + +This document outlines the transition from the current WebSockets (wss) code syncing approach to a fully distributed, secure, multi-agent architecture where the Cortex Server orchestrates powerful local "Agent Nodes." + +## 🏗️ High-Level Architecture + +### 1. The Cortex Server (Orchestrator) +- **Role**: The central brain. Handles AI inference, task planning, and user interface. +- **Communication Hub**: Exposes a bidirectional streaming endpoint (gRPC over HTTP/2 or robust WebSockets) to securely manage connections from multiple remote Agent Nodes. +- **Node Registry**: Keeps track of connected nodes, their identities, capabilities, and health status. + +### 2. The Agent Node (Client Software) +- **Role**: A lightweight, standalone daemon running on the user's local machine (or specific dev containers). +- **Execution Engine**: Receives tasks from the server, executes them locally (using host resources), and streams results back. +- **Capabilities**: + - **System Ops**: Run bash commands, edit files, list directories. + - **Browser Automation**: Control local browsers via CDP (Chrome DevTools Protocol) for UI testing and visual feedback. + - **Auditing**: Maintains a strict, immutable local log of every command executed by the AI, ensuring the user has a transparent trail of data access. + +### 3. Tunneling & Security +- **Bidirectional Tunnel**: Allows the server to proactively dispatch tasks (like "open a file" or "click this button") rather than waiting for the client to poll. +- **JWT Identity & Authz**: + - Each Agent Node is bootstrapped with a unique identity (Service Account or User-bound token). + - The node presents a short-lived JWT upon tunnel connection. The server validates the claims to ensure the node is authorized. +- **mTLS (Optional but Recommended)**: For enterprise-grade security, Mutual TLS can be established between the Server and Agent Node to prevent Man-in-the-Middle attacks. + +--- + +## 🛠️ Execution Plan + +We will execute this transformation in 6 phased milestones. + +### Phase 1: Protocol & Tunnel Proof of Concept (POC) +- **Goal**: Establish a reliable, bidirectional, asynchronous connection that supports retries. +- **Tasks**: + - Define the communication protocol (gRPC streams vs. WebSockets with exact message schemas: `TaskRequest`, `TaskResponse`, `Heartbeat`). + - Build a dummy Python/Node.js Agent Client that connects to the Cortex backend. + - Implement connection retry logic with exponential backoff. +- **Outcome**: Server can send a simple "Echo" task to the client, and the client processes it and returns the result. + +### Phase 2: Security & Identity Implementation +- **Goal**: Lock down the tunnel. +- **Tasks**: + - Implement JWT minting for Agent Nodes on the Cortex Server. + - Require the Agent Client to authenticate during the initial handshake. + - Associate connected sessions with a specific User/Workspace identity to enforce authorization boundaries. +- **Outcome**: Only authenticated nodes can connect; connections are mapped to user sessions. + +### Phase 3: Core Capabilities & Auditing (The Local Engine) +- **Goal**: Give the Agent Node hands and eyes. +- **Tasks**: + - Implement the `ShellTool` on the client (safe bash execution with timeouts). + - Implement the `FileSystemTool` (read/write/grep). + - Build the **Audit Interceptor**: Every command requested by the server is logged locally (e.g., `~/.cortex/audit.log`) before execution. +- **Outcome**: The Server can ask the Client to read `/etc/os-release` and get the output back safely. + +### Phase 4: Browser Automation (The "Antigravity" Feature) +- **Goal**: Allow the Agent Node to interact with local web apps. +- **Tasks**: + - Integrate Playwright or CDP connectivity into the Agent Node. + - Create standardized commands like `Navigate`, `Click`, `CaptureScreenshot`. + - Stream screenshots back over the tunnel as base64 or chunks. +- **Outcome**: The Server can instruct the client's browser to open localhost:8080 and take a screenshot. + +### Phase 5: Concurrency & Parallel Execution +- **Goal**: Handle multiple simultaneous requests safely. +- **Tasks**: + - Implement asynchronous task workers on the Agent Node. + - Ensure thread-safety for file writes and browser controls. + - Add task cancellation mechanisms. +- **Outcome**: Server can issue 5 simultaneous file-read operations and they complete concurrently without blocking the tunnel. + +### Phase 6: Frontend UI Integration & Refactoring +- **Goal**: Replace the old UI approach with the new system. +- **Tasks**: + - Update the `CodingAssistantPage` to recognize connected Agent Nodes instead of relying on the old WSS sync logic. + - Display connected nodes in the UI. + - Give users a dashboard to view the remote audit logs from the UI. +- **Outcome**: A seamless user experience powered by the new architecture. + +--- + +## 🔬 Next Steps +Before we create GitBucket issues, we should build a minimal implementation of **Phase 1** to validate the networking stack (FastAPI WebSockets or gRPC) and ensure it handles bidirectionality well.