Cortex Agent Node: Architecture & Implementation Plan
This document outlines the transition from the current WebSockets (wss) code syncing approach to a fully distributed, secure, multi-agent architecture where the Cortex Server orchestrates powerful local "Agent Nodes."
🏗️ High-Level Architecture
1. The Cortex Server (Orchestrator)
- Role: The central brain. Handles AI inference, task planning, and user interface.
- Communication Hub: Exposes a bidirectional streaming endpoint via gRPC over HTTP/2 to securely manage connections. gRPC provides native bidirectional streaming, built-in schema enforcement via Protobuf, first-class retry semantics, and stronger backpressure handling compared to WebSockets.
- Node Registry: Keeps track of connected nodes, their identities, health status, and Capability Discovery. The server treats Agent Nodes as heterogeneous and relies on a capability manifest sent by the node.
2. The Agent Node (Client Software)
- Role: A lightweight, standalone daemon running on the user's local machine or CI runners.
- Execution Engine: Receives tasks from the server, executes them locally via an isolated execution context, and streams results back.
- Capabilities:
- System Ops: Run bash commands, edit files, list directories within a strict sandbox.
- Browser Automation: Control local browsers via CDP (Chrome DevTools Protocol) for UI testing, allowing chunked binary frames, headless/visible toggling, and DOM snapshot streaming.
- Auditing & Observability: Maintains a strict, immutable local log of every command. Emits task execution timing, failure counters, and crash telemetry.
3. Tunneling & Security
- The "Phone Home" Pattern: To bypass NAT and firewalls, the Agent Node initiates an outbound HTTPS/HTTP2 connection (Outbound 443) to the server. The server then pushes tasks down this persistent bidirectional stream.
- Security Stack:
- mTLS: Validates node identity and establishes enterprise-grade encryption.
- Short-lived JWT: Provides session authentication and fine-grained authorization (Capability claims).
- Task Signatures: Tasks from the server should be signed to prevent injection, allowing the Node to validate task signatures, issuers, expiry, and user binding.
🛠️ Execution Plan
We will execute this transformation in 6 phased milestones.
Phase 1: Protocol & Tunnel Proof of Concept (POC) - ✅ COMPLETE
- Status: Verified in
/app/poc-grpc-agent/.
- Achievements:
- Defined
agent.proto with bidirectional streaming.
- Implemented Python
server.py and client.py.
- Successfully demonstrated registration, heartbeat pattern, and task-dispatch with remote shell execution.
- Validated multiplexing and backpressure via gRPC.
- Outcome: Server can dispatch an idempotent "Echo" task down the gRPC stream.
Phase 2: Security, Identity & Observability - ✅ COMPLETE
- Status: Verified in
/app/poc-grpc-agent/.
- Achievements:
- mTLS Implementation: Root CA, Server (localhost), and Node (agent-node-007) certificate management scripts.
- JWT Handshake: Implemented short-lived JWT token verification during
RegistrationRequest.
- Task Signing: HMAC-SHA256 signature verification for every single
TaskRequest payload.
- Observability: Introduced
trace_id for OpenTelemetry support in all messages, including node crash reports and execution timing.
- Outcome: Only authenticated, signed tasks run, with full tracing across the distributed system.
Phase 3: Core Capabilities & Secure Engine - ✅ COMPLETE
- Status: Verified in
/app/poc-grpc-agent/.
- Achievements:
- Dual-Mode Sandbox Policy: Supports STRICT (Whitelist-only) for hardened nodes and PERMISSIVE (Blacklist-based) for local power users.
- Path Guarding: Proactive blocking of path traversal attacks using
.. normalization (Always Enforced).
- Consent Mechanism: Integrated logic to flag commands requiring user-terminal approval.
- Strict Deny-List: Automated rejection of privileged commands (
sudo, mkfs, etc.) in all modes.
- Capability Manifest: Handshake now includes a JSON-based report for version and platforms.
- Outcome: Secure, auditable, and consensual execution of system queries.
Phase 4: Browser Automation (The "Antigravity" Feature)
- Goal: Allow Agent to interact with local web apps.
- Tasks:
- Implement CDP (Chrome DevTools Protocol) to attach to existing browsers.
- Stream screenshots efficiently using chunked binary frames over gRPC, possibly with compression or delta snaps.
- Include headless/visible toggles, timeboxed navigation, and DOM snapshot streaming.
- Outcome: High-performance, low-latency visual interaction with local web pages.
Phase 5: Concurrency & Task Isolation
- Goal: Handle simultaneous requests without state corruption.
- Tasks:
- Task Isolation Model: Ensure each task runs in an isolated context. Browser actions get isolated contexts; file writes use advisory locks.
- Introduce Resource Quotas (limit % CPU/Memory per agent).
- Outcome: Multiple tasks execute safely and concurrently without race conditions.
Phase 6: Scaling & Frontend UI Integration
- Goal: Support multiple nodes and surface insights in the UI.
- Tasks:
- Scaling: Prepare for multi-node orchestration (e.g., node pools, load-aware dispatch, Redis/NATS as control plane).
- Update
CodingAssistantPage to recognize nodes via the Node Registry.
- Provide users a UI dashboard for remote audit logs and tracing.
- Outcome: A seamless user experience managing distributed execution.
🔬 Recommended Next Steps
Before mapping this into JIRA/GitBucket issues, we should build the gRPC Protobuf Schema (agent.proto) and establish the Phase 1 Dummy Python Server/Client.
Shall I proceed with writing the initial Protobuf definition to solidify the API contract?