π Cortex Distributed Agent: Master Project Roadmap
This document consolidates all prioritized tasks, technical debt, swarm optimizations, and future strategic implementations for the Cortex project.
β
Completed Milestones (Core & Swarm Optimization)
1. π LLM Monitoring Latency (Resolved)
- Fast-Path Pattern Heuristics: Regex detection for instant completions.
- Adaptive AI-Driven Polling: AI specifies
next_check_seconds instead of static backoff.
- Edge Intelligence (Shift Left): Agent Node PTY reader actively scans output and fires prompt detection events to wake up sleeping SubAgents.
2. π§© Backend Modularity & Plugin System (Resolved)
- Fat Routers Slimmed: Routes like
nodes.py and user.py purged of business logic.
- Service Layer Extraction: Extracted
MeshService, AuthService, PreferenceService, and SessionService.
- Dynamic Tool Registry:
ToolService decoupled into an auto-loading plugin registry.
- Models Split: Monolithic
models.py broken down into domain-driven files.
2. π Swarm Choreography (Resolved)
- Branching Agency (
EXECUTE Action): SubAgent natively handles cross-node orchestrations dynamically based on terminal state.
3. πΎ PTY Memory Pressure & Logging (Resolved)
- TaskJournal Head+Tail Bounds: 10KB Head + 30KB Tail max memory footprint for AI context.
- Persistence Offloading: Nodes stream gigabyte outputs natively to temp disk files instead of expanding the application heap.
- Client-Side Truncation: 15KB/sec rate limit on PTY execution to protect the Orchestrator's gRPC stream from being DDOSed.
- Graceful Shutdown: SIGTERM/SIGINT grace periods for local node cleanup.
- Ghost Mirror: Workspace bidirectional synchronization fully operational.
4. π Dedicated Browser Service (Resolved)
- Service Decoupling: Browser automation moved to a standalone high-performance service.
- Sidecar RAM Handoff:
/dev/shm based zero-copy transfer for DOM and Screenshots.
- Full Perception: Integrated A11y tree, JS evaluation, and persistent session management.
π High Priority (Infrastructure & Scalability)
1. πΈοΈ Hub GIL Contention & Scalability
Goal: Scale horizontally to handle 100+ simultaneous connected nodes.
- [ ] Multiprocessing for Serialization: Refactor
dispatch_swarm to use ProcessPoolExecutor to bypass GIL for heavy Protobuf signing/serialization.
- [ ] Async gRPC Internalization: Migrate Orchestrator fully to
grpc.aio to handle concurrent streaming and queue management without thread locks.
- [ ] Sharded Registry: Distribute connections across multiple Hub instances using Redis as a shared state/journal layer.
3. π¦ Binary Artifact & Large Data Chunking
Goal: Transmit massive payloads smoothly.
- [ ] gRPC Chunking for Large Files: Enable streaming of high-def browser session videos or database dumps over the task channel without hitting message size limits.
4. π’ Multi-Tenancy & Resource Isolation
Goal: Security and fairness.
- [ ] Tenant Segregation: Isolate node groups by tenant boundary, enforce hardware quotas, and allow the Hub to forcefully reap zombie tasks orphaned on nodes.
5. πΎ Server-Side Task Persistence
Goal: Hub resilience.
- [ ] DB Backend Integration: Migrate
NodeRegistry and WorkPool states from in-memory dicts to Postgres/Redis. Deferred to full system integration.
π’ Low Priority & Future Strategic Roadmap
[ ] Architectural Refinement: Unified Worker Shim
- Move from a Python "Skill" abstraction to isolated background worker processes per task (a dedicated Shell process, a dedicated Playwright daemon) for better fault isolation.
[ ] OS-Level Isolation (Firecracker/microVMs)
- Run arbitrary AI tasks in automated, ephemeral microVMs for strict security sandboxing, instead of bare metal or standard containers.
[ ] Advanced Scheduling & Capability Routing
- Sophisticated task scheduler matching jobs against GPU limits, regions, or specific OS hardware constraints dynamically.
[ ] Infrastructure Automation
- Node Auto-Updates: Secure execution pipeline for deployed agent nodes to pull and self-update their binaries/versions.
- mTLS Lifecycle Management: Automated certificate renewal, revocation, and rotation.
[ ] Immutable Audit & Compliance
- Cryptographically signed forensic logging for every
TaskRequest and TaskResponse passing through the Hub.