Cortex Agent Node: Architecture & Implementation Plan
This document outlines the transition from the current WebSockets (wss) code syncing approach to a fully distributed, secure, multi-agent architecture where the Cortex Server orchestrates powerful local "Agent Nodes."
🏗️ High-Level Architecture
1. The Cortex Server (Orchestrator)
- Role: The central brain. Handles AI inference, task planning, and user interface.
- Communication Hub: Exposes a bidirectional streaming endpoint via gRPC over HTTP/2 to securely manage connections. gRPC provides native bidirectional streaming, built-in schema enforcement via Protobuf, first-class retry semantics, and stronger backpressure handling compared to WebSockets.
- Node Registry: Keeps track of connected nodes, their identities, health status, and Capability Discovery. The server treats Agent Nodes as heterogeneous and relies on a capability manifest sent by the node.
2. The Agent Node (Client Software)
- Role: A lightweight, standalone daemon running on the user's local machine or CI runners.
- Execution Engine: Receives tasks from the server, executes them locally via an isolated execution context, and streams results back.
- Capabilities:
- System Ops: Run bash commands, edit files, list directories within a strict sandbox.
- Browser Automation: Control local browsers via CDP (Chrome DevTools Protocol) for UI testing, allowing chunked binary frames, headless/visible toggling, and DOM snapshot streaming.
- Auditing & Observability: Maintains a strict, immutable local log of every command. Emits task execution timing, failure counters, and crash telemetry.
3. Tunneling & Security
- The "Phone Home" Pattern: To bypass NAT and firewalls, the Agent Node initiates an outbound HTTPS/HTTP2 connection (Outbound 443) to the server. The server then pushes tasks down this persistent bidirectional stream.
- Security Stack:
- mTLS: Validates node identity and establishes enterprise-grade encryption.
- Short-lived JWT: Provides session authentication and fine-grained authorization (Capability claims).
- Task Signatures: Tasks from the server should be signed to prevent injection, allowing the Node to validate task signatures, issuers, expiry, and user binding.
🛠️ Execution Plan
We will execute this transformation in 6 phased milestones.
Phase 1: Protocol & Tunnel Proof of Concept (POC) - ✅ COMPLETE
- Status: Verified in
/app/poc-grpc-agent/.
- Achievements:
- Defined
agent.proto with bidirectional streaming.
- Implemented Python
server.py and client.py.
- Successfully demonstrated registration, heartbeat pattern, and task-dispatch with remote shell execution.
- Validated multiplexing and backpressure via gRPC.
- Outcome: Server can dispatch an idempotent "Echo" task down the gRPC stream.
Phase 2: Security, Identity & Observability - ✅ COMPLETE
- Status: Verified in
/app/poc-grpc-agent/.
- Achievements:
- mTLS Implementation: Root CA, Server (localhost), and Node (agent-node-007) certificate management scripts.
- JWT Handshake: Implemented short-lived JWT token verification during
RegistrationRequest.
- Task Signing: HMAC-SHA256 signature verification for every single
TaskRequest payload.
- Observability: Introduced
trace_id for OpenTelemetry support in all messages, including node crash reports and execution timing.
- Outcome: Only authenticated, signed tasks run, with full tracing across the distributed system.
Phase 3: Core Capabilities & Secure Engine - ✅ COMPLETE
- Status: Verified in
/app/poc-grpc-agent/.
- Achievements:
- Dual-Mode Sandbox Policy: Supports STRICT (Whitelist-only) for hardened nodes and PERMISSIVE (Blacklist-based) for local power users.
- Path Guarding: Proactive blocking of path traversal attacks using
.. normalization (Always Enforced).
- Consent Mechanism: Integrated logic to flag commands requiring user-terminal approval.
- Strict Deny-List: Automated rejection of privileged commands (
sudo, mkfs, etc.) in all modes.
- Capability Manifest: Handshake now includes a JSON-based report for version and platforms.
- Outcome: Secure, auditable, and consensual execution of system queries.
Phase 4: Browser Automation (The "Antigravity" Feature) - ✅ COMPLETE
- Status: Verified in
/app/poc-grpc-agent/.
- Achievements:
- Browser Actor Threading: Solved Playwright threading issues with a dedicated Actor model.
- Real-time Event Tunneling:
console.log and network fetch events are streamed instantly to the server.
- Advanced Perception: Implemented A11y tree extraction and JS evaluation for deep page understanding.
- Multi-Session Support: Capability to handle multiple isolated browser contexts simultaneously.
- Outcome: High-performance, low-latency visual and semantic interaction with local web pages.
Phase 5: Modular 12-Factor Refactor & Mesh Foundations - ✅ COMPLETE
- Status: Verified in
/app/poc-grpc-agent/.
- Achievements:
- Modular Architecture: Split the monolith into
orchestrator/ and agent_node/ packages.
- 12-Factor Compliance: Configuration is now fully externalized via Environment Variables.
- Skill-Based Extensibility: Unified
BaseSkill interface for Shell, Browser, and future capabilities.
- Graceful Shutdown: Implemented
SIGTERM/SIGINT handling with clean browser-actor cleanup.
- Global Work Pool: Shared task discovery and Task Claiming to prevent rework across nodes.
- Hanging Task Recovery: Remote cancellation and automatic retries in the
TaskAssistant.
- Outcome: A professional, scalable, and extensible distributed agent mesh.
Phase 6: Scaling & Frontend UI Integration
- Goal: Support multiple nodes and surface insights in the UI.
- Tasks:
- Scaling: Prepare for multi-node orchestration (e.g., node pools, load-aware dispatch, Redis/NATS as control plane).
- Update
CodingAssistantPage to recognize nodes via the Node Registry.
- Provide users a UI dashboard for remote audit logs and tracing.
- Outcome: A seamless user experience managing distributed execution.
🔬 Recommended Next Steps
Before mapping this into JIRA/GitBucket issues, we should build the gRPC Protobuf Schema (agent.proto) and establish the Phase 1 Dummy Python Server/Client.
Shall I proceed with writing the initial Protobuf definition to solidify the API contract?