📝 Cortex Distributed Agent: Master Project Roadmap

This document consolidates all prioritized tasks, technical debt, swarm optimizations, and future strategic implementations for the Cortex project.

✅ Completed Milestones (Core & Swarm Optimization)

Fast-Path Pattern Heuristics: Regex detection for instant completions.
Adaptive AI-Driven Polling: AI specifies next_check_seconds instead of static backoff.
Edge Intelligence (Shift Left): Agent Node PTY reader actively scans output and fires prompt detection events to wake up sleeping SubAgents.

Branching Agency (EXECUTE Action): SubAgent natively handles cross-node orchestrations dynamically based on terminal state.

TaskJournal Head+Tail Bounds: 10KB Head + 30KB Tail max memory footprint for AI context.
Persistence Offloading: Nodes stream gigabyte outputs natively to temp disk files instead of expanding the application heap.
Client-Side Truncation: 15KB/sec rate limit on PTY execution to protect the Orchestrator's gRPC stream from being DDOSed.
Graceful Shutdown: SIGTERM/SIGINT grace periods for local node cleanup.
Ghost Mirror: Workspace bidirectional synchronization fully operational.

Service Decoupling: Browser automation moved to a standalone high-performance service.
Sidecar RAM Handoff: /dev/shm based zero-copy transfer for DOM and Screenshots.
Full Perception: Integrated A11y tree, JS evaluation, and persistent session management.

Goal: Scale horizontally to handle 100+ simultaneous connected nodes.

[ ] Multiprocessing for Serialization: Refactor dispatch_swarm to use ProcessPoolExecutor to bypass GIL for heavy Protobuf signing/serialization.
[ ] Async gRPC Internalization: Migrate Orchestrator fully to grpc.aio to handle concurrent streaming and queue management without thread locks.
[ ] Sharded Registry: Distribute connections across multiple Hub instances using Redis as a shared state/journal layer.

Goal: Transmit massive payloads smoothly.

[ ] gRPC Chunking for Large Files: Enable streaming of high-def browser session videos or database dumps over the task channel without hitting message size limits.

Goal: Security and fairness.

[ ] Tenant Segregation: Isolate node groups by tenant boundary, enforce hardware quotas, and allow the Hub to forcefully reap zombie tasks orphaned on nodes.

Goal: Hub resilience.

[ ] DB Backend Integration: Migrate NodeRegistry and WorkPool states from in-memory dicts to Postgres/Redis. Deferred to full system integration.

Move from a Python "Skill" abstraction to isolated background worker processes per task (a dedicated Shell process, a dedicated Playwright daemon) for better fault isolation.

Run arbitrary AI tasks in automated, ephemeral microVMs for strict security sandboxing, instead of bare metal or standard containers.

Sophisticated task scheduler matching jobs against GPU limits, regions, or specific OS hardware constraints dynamically.

Node Auto-Updates: Secure execution pipeline for deployed agent nodes to pull and self-update their binaries/versions.
mTLS Lifecycle Management: Automated certificate renewal, revocation, and rotation.

Cryptographically signed forensic logging for every TaskRequest and TaskResponse passing through the Hub.