diff --git a/docs/architecture/cortex_agent_node_plan.md b/docs/architecture/cortex_agent_node_plan.md index a9e76e7..c4bad77 100644 --- a/docs/architecture/cortex_agent_node_plan.md +++ b/docs/architecture/cortex_agent_node_plan.md @@ -6,8 +6,8 @@ ### 1. The Cortex Server (Orchestrator) - **Role**: The central brain. Handles AI inference, task planning, and user interface. -- **Communication Hub**: Exposes a bidirectional streaming endpoint (gRPC over HTTP/2 or robust WebSockets) to securely manage connections from multiple remote Agent Nodes. -- **Node Registry**: Keeps track of connected nodes, their identities, capabilities, and health status. +- **Communication Hub**: Exposes a bidirectional streaming endpoint via **gRPC over HTTP/2** to securely manage connections from multiple remote Agent Nodes. +- **Node Registry**: Keeps track of connected nodes, their identities, health status, and most importantly, **Capability Discovery** (e.g., knows if a node has Docker, Python, or Chrome installed before sending a task). ### 2. The Agent Node (Client Software) - **Role**: A lightweight, standalone daemon running on the user's local machine (or specific dev containers). @@ -18,11 +18,11 @@ - **Auditing**: Maintains a strict, immutable local log of every command executed by the AI, ensuring the user has a transparent trail of data access. ### 3. Tunneling & Security -- **Bidirectional Tunnel**: Allows the server to proactively dispatch tasks (like "open a file" or "click this button") rather than waiting for the client to poll. +- **The "Phone Home" Pattern**: To bypass NAT and firewalls (e.g., home routers, corporate networks), the Agent Node initiates an outbound HTTPS/HTTP2 connection to the server. The server then pushes tasks down this persistent bidirectional stream. - **JWT Identity & Authz**: - Each Agent Node is bootstrapped with a unique identity (Service Account or User-bound token). - The node presents a short-lived JWT upon tunnel connection. The server validates the claims to ensure the node is authorized. -- **mTLS (Optional but Recommended)**: For enterprise-grade security, Mutual TLS can be established between the Server and Agent Node to prevent Man-in-the-Middle attacks. +- **mTLS**: For enterprise-grade security and strict node identity validation, Mutual TLS should be established between the Server and Agent Node. --- @@ -31,12 +31,12 @@ We will execute this transformation in 6 phased milestones. ### Phase 1: Protocol & Tunnel Proof of Concept (POC) -- **Goal**: Establish a reliable, bidirectional, asynchronous connection that supports retries. +- **Goal**: Establish a reliable, bidirectional gRPC connection that supports retries and backpressure. - **Tasks**: - - Define the communication protocol (gRPC streams vs. WebSockets with exact message schemas: `TaskRequest`, `TaskResponse`, `Heartbeat`). - - Build a dummy Python/Node.js Agent Client that connects to the Cortex backend. - - Implement connection retry logic with exponential backoff. -- **Outcome**: Server can send a simple "Echo" task to the client, and the client processes it and returns the result. + - Define the Protobuf schema (`agent.proto`) with structured messages: `TaskRequest` (needs `task_id`, `idempotency_key`, `capability_required`), `TaskResponse`, and `Heartbeat`. + - Build a Python gRPC server and client to validate connection multiplexing. + - Implement gRPC keep-alives and exponential backoff retry logic. +- **Outcome**: Server can dispatch an idempotent "Echo" task down the gRPC stream. ### Phase 2: Security & Identity Implementation - **Goal**: Lock down the tunnel. @@ -46,29 +46,30 @@ - Associate connected sessions with a specific User/Workspace identity to enforce authorization boundaries. - **Outcome**: Only authenticated nodes can connect; connections are mapped to user sessions. -### Phase 3: Core Capabilities & Auditing (The Local Engine) -- **Goal**: Give the Agent Node hands and eyes. +### Phase 3: Core Capabilities & Secure Engine (The Local Sandbox) +- **Goal**: Give the Agent Node hands and eyes, safely. - **Tasks**: - - Implement the `ShellTool` on the client (safe bash execution with timeouts). - - Implement the `FileSystemTool` (read/write/grep). - - Build the **Audit Interceptor**: Every command requested by the server is logged locally (e.g., `~/.cortex/audit.log`) before execution. -- **Outcome**: The Server can ask the Client to read `/etc/os-release` and get the output back safely. + - **Capability Negotiation**: Agent sends a manifest (`node_id`, `capabilities: {shell: true, fs: true}`, `platform`) on connection. + - **Execution Sandbox**: Enforce a strict "Command Sandbox Policy" (whitelist allowed commands, restrict network). + - **Consent-based Execution**: Add a "Strict Mode" where the Agent prompts the local user (Y/N) in the terminal before destructive actions. + - **Audit Interceptor**: Every command requested by the server is logged locally (append-only) before execution. +- **Outcome**: The Server can safely ask the Client to read `/etc/os-release`. ### Phase 4: Browser Automation (The "Antigravity" Feature) - **Goal**: Allow the Agent Node to interact with local web apps. - **Tasks**: - - Integrate Playwright or CDP connectivity into the Agent Node. + - Implement a lightweight CDP (Chrome DevTools Protocol) integration to attach to an already running browser instance (avoids heavy Playwright dependencies). - Create standardized commands like `Navigate`, `Click`, `CaptureScreenshot`. - - Stream screenshots back over the tunnel as base64 or chunks. -- **Outcome**: The Server can instruct the client's browser to open localhost:8080 and take a screenshot. + - Stream screenshots back over the gRPC tunnel natively using chunked binary frames. +- **Outcome**: The Server can instruct the client's local browser to open localhost:8080 and stream a screenshot. -### Phase 5: Concurrency & Parallel Execution -- **Goal**: Handle multiple simultaneous requests safely. +### Phase 5: Concurrency & Task Isolation +- **Goal**: Handle multiple simultaneous requests safely without corruption. - **Tasks**: + - Define a strict **Task Isolation Model**: File writes use advisory locks; browser actions run in isolated contexts. - Implement asynchronous task workers on the Agent Node. - - Ensure thread-safety for file writes and browser controls. - - Add task cancellation mechanisms. -- **Outcome**: Server can issue 5 simultaneous file-read operations and they complete concurrently without blocking the tunnel. + - Introduce Resource Quotas (limit Agent Node to max % CPU/Memory). +- **Outcome**: Server can issue 5 simultaneous operations and they complete concurrently without blocking the tunnel or corrupting state. ### Phase 6: Frontend UI Integration & Refactoring - **Goal**: Replace the old UI approach with the new system. @@ -80,5 +81,7 @@ --- -## 🔬 Next Steps -Before we create GitBucket issues, we should build a minimal implementation of **Phase 1** to validate the networking stack (FastAPI WebSockets or gRPC) and ensure it handles bidirectionality well. +## 🔬 Recommended Next Steps +Before mapping this into JIRA/GitBucket issues, we should build the **gRPC Protobuf Schema** (`agent.proto`) and establish the Phase 1 Dummy Python Server/Client. + +Shall I proceed with writing the initial Protobuf definition to solidify the API contract?