π Cortex Distributed Agent: Master Project Roadmap
This document consolidates all prioritized tasks, technical debt, swarm optimizations, and future strategic implementations for the Cortex project.
β
Completed Milestones (Core & Swarm Optimization)
1. π LLM Monitoring Latency (Resolved)
- Fast-Path Pattern Heuristics: Regex detection for instant completions.
- Adaptive AI-Driven Polling: AI specifies
next_check_seconds instead of static backoff.
- Edge Intelligence (Shift Left): Agent Node PTY reader actively scans output and fires prompt detection events to wake up sleeping SubAgents.
2. π Swarm Choreography (Resolved)
- Branching Agency (
EXECUTE Action): SubAgent natively handles cross-node orchestrations dynamically based on terminal state.
3. πΎ PTY Memory Pressure & Logging (Resolved)
- TaskJournal Head+Tail Bounds: 10KB Head + 30KB Tail max memory footprint for AI context.
- Persistence Offloading: Nodes stream gigabyte outputs natively to temp disk files instead of expanding the application heap.
- Client-Side Truncation: 15KB/sec rate limit on PTY execution to protect the Orchestrator's gRPC stream from being DDOSed.
- Graceful Shutdown: SIGTERM/SIGINT grace periods for local node cleanup.
- Ghost Mirror: Workspace bidirectional synchronization fully operational.
π High Priority (Infrastructure & Scalability)
1. πΈοΈ Hub GIL Contention & Scalability
Goal: Scale horizontally to handle 100+ simultaneous connected nodes.
- [ ] Multiprocessing for Serialization: Refactor
dispatch_swarm to use ProcessPoolExecutor to bypass GIL for heavy Protobuf signing/serialization.
- [ ] Async gRPC Internalization: Migrate Orchestrator fully to
grpc.aio to handle concurrent streaming and queue management without thread locks.
- [ ] Sharded Registry: Distribute connections across multiple Hub instances using Redis as a shared state/journal layer.
2. π Comprehensive Browser Skill (Antigravity CDP)
Goal: Support a professional, high-fidelity browser interaction layer natively.
- [ ] JS Console & Network Tunnels: Pipe
console.log/error and XHR/Fetch traffic (HAR) back to the AI.
- [ ] A11y Tree Perception: Provide the Accessibility Tree (JSON) to the AI instead of raw DOM for semantic control.
- [ ] Advanced Interactions: Add Hover, Scroll, Drag & Drop, Multi-key injection, and EVAL javascript extraction capabilities.
- [ ] Smart Wait Logic:
wait_for_network_idle and custom predicates to eliminate cross-node browser task flakiness.
3. π¦ Binary Artifact & Large Data Chunking
Goal: Transmit massive payloads smoothly.
- [ ] gRPC Chunking for Large Files: Enable streaming of high-def browser session videos or database dumps over the task channel without hitting message size limits.
4. π’ Multi-Tenancy & Resource Isolation
Goal: Security and fairness.
- [ ] Tenant Segregation: Isolate node groups by tenant boundary, enforce hardware quotas, and allow the Hub to forcefully reap zombie tasks orphaned on nodes.
5. πΎ Server-Side Task Persistence
Goal: Hub resilience.
- [ ] DB Backend Integration: Migrate
NodeRegistry and WorkPool states from in-memory dicts to Postgres/Redis. Deferred to full system integration.
π’ Low Priority & Future Strategic Roadmap
[ ] Architectural Refinement: Unified Worker Shim
- Move from a Python "Skill" abstraction to isolated background worker processes per task (a dedicated Shell process, a dedicated Playwright daemon) for better fault isolation.
[ ] OS-Level Isolation (Firecracker/microVMs)
- Run arbitrary AI tasks in automated, ephemeral microVMs for strict security sandboxing, instead of bare metal or standard containers.
[ ] Advanced Scheduling & Capability Routing
- Sophisticated task scheduler matching jobs against GPU limits, regions, or specific OS hardware constraints dynamically.
[ ] Infrastructure Automation
- Node Auto-Updates: Secure execution pipeline for deployed agent nodes to pull and self-update their binaries/versions.
- mTLS Lifecycle Management: Automated certificate renewal, revocation, and rotation.
[ ] Immutable Audit & Compliance
- Cryptographically signed forensic logging for every
TaskRequest and TaskResponse passing through the Hub.