Cortex Agent Node: Architecture & Implementation Plan

This document outlines the transition from the current WebSockets (wss) code syncing approach to a fully distributed, secure, multi-agent architecture where the Cortex Server orchestrates powerful local "Agent Nodes."

🏗️ High-Level Architecture

1. The Cortex Server (Orchestrator)

Role: The central brain. Handles AI inference, task planning, and user interface.
Communication Hub: Exposes a bidirectional streaming endpoint via gRPC over HTTP/2 to securely manage connections. gRPC provides native bidirectional streaming, built-in schema enforcement via Protobuf, first-class retry semantics, and stronger backpressure handling compared to WebSockets.
Node Registry: Keeps track of connected nodes, their identities, health status, and Capability Discovery. The server treats Agent Nodes as heterogeneous and relies on a capability manifest sent by the node.

2. The Agent Node (Client Software)

Role: A lightweight, standalone daemon running on the user's local machine or CI runners.
Execution Engine: Receives tasks from the server, executes them locally via an isolated execution context, and streams results back.
Capabilities:
- System Ops: Run bash commands, edit files, list directories within a strict sandbox.
- Browser Automation: Control local browsers via CDP (Chrome DevTools Protocol) for UI testing, allowing chunked binary frames, headless/visible toggling, and DOM snapshot streaming.
- Auditing & Observability: Maintains a strict, immutable local log of every command. Emits task execution timing, failure counters, and crash telemetry.

3. Tunneling & Security

The "Phone Home" Pattern: To bypass NAT and firewalls, the Agent Node initiates an outbound HTTPS/HTTP2 connection (Outbound 443) to the server. The server then pushes tasks down this persistent bidirectional stream.
Security Stack:
- mTLS: Validates node identity and establishes enterprise-grade encryption.
- Short-lived JWT: Provides session authentication and fine-grained authorization (Capability claims).
- Task Signatures: Tasks from the server should be signed to prevent injection, allowing the Node to validate task signatures, issuers, expiry, and user binding.

🛠️ Execution Plan

We will execute this transformation in 6 phased milestones.

Phase 1: Protocol & Tunnel Proof of Concept (POC) - ✅ COMPLETE

Status: Verified in /app/poc-grpc-agent/.
Achievements:
- Defined agent.proto with bidirectional streaming.
- Implemented Python server.py and client.py.
- Successfully demonstrated registration, heartbeat pattern, and task-dispatch with remote shell execution.
- Validated multiplexing and backpressure via gRPC.
Outcome: Server can dispatch an idempotent "Echo" task down the gRPC stream.

Phase 2: Security, Identity & Observability

Goal: Lock down the tunnel and introduce tracing.
Tasks:
- Implement the Security Stack (mTLS, JWT, Task Signatures).
- Require the Agent Client to authenticate and map connections to a User/Workspace.
- Observability: Add per-task tracing IDs, structured logs on the server side, and OpenTelemetry for node crash reports and execution timing.
Outcome: Only authenticated, signed tasks run, with full tracing across the distributed system.

Phase 3: Core Capabilities & Secure Engine (The Local Sandbox)

Goal: Safely execute host commands and establish audit logs.
Tasks:
- Capability Negotiation: Agent sends a JSON manifest (version, platform, capabilities) on connection.
- Command Sandbox Policy: Disallow network access by default, run under non-privileged user, and strictly whitelist allowed commands.
- Consent-based Execution: Add a "Strict Mode" (manual Y/N prompt for every command) and "Auto-Approve" for non-destructive actions.
- Advanced Auditing: Implement append-only local logs with periodic hash chaining and optional tamper detection (hash tree).
Outcome: Secure, auditable, and consensual execution of system queries.

Phase 4: Browser Automation (The "Antigravity" Feature)

Goal: Allow Agent to interact with local web apps.
Tasks:
- Implement CDP (Chrome DevTools Protocol) to attach to existing browsers.
- Stream screenshots efficiently using chunked binary frames over gRPC, possibly with compression or delta snaps.
- Include headless/visible toggles, timeboxed navigation, and DOM snapshot streaming.
Outcome: High-performance, low-latency visual interaction with local web pages.

Phase 5: Concurrency & Task Isolation

Goal: Handle simultaneous requests without state corruption.
Tasks:
- Task Isolation Model: Ensure each task runs in an isolated context. Browser actions get isolated contexts; file writes use advisory locks.
- Introduce Resource Quotas (limit % CPU/Memory per agent).
Outcome: Multiple tasks execute safely and concurrently without race conditions.

Phase 6: Scaling & Frontend UI Integration

Goal: Support multiple nodes and surface insights in the UI.
Tasks:
- Scaling: Prepare for multi-node orchestration (e.g., node pools, load-aware dispatch, Redis/NATS as control plane).
- Update CodingAssistantPage to recognize nodes via the Node Registry.
- Provide users a UI dashboard for remote audit logs and tracing.
Outcome: A seamless user experience managing distributed execution.

🔬 Recommended Next Steps

Before mapping this into JIRA/GitBucket issues, we should build the gRPC Protobuf Schema (agent.proto) and establish the Phase 1 Dummy Python Server/Client.

Shall I proceed with writing the initial Protobuf definition to solidify the API contract?