Newer
Older
cortex-hub / docs / architecture / cortex_project_todo.md

πŸ“ Cortex Distributed Agent: Master Project Roadmap

This document consolidates all prioritized tasks, technical debt, swarm optimizations, and future strategic implementations for the Cortex project.


βœ… Completed Milestones (Core & Swarm Optimization)

1. πŸ•’ LLM Monitoring Latency (Resolved)

  • Fast-Path Pattern Heuristics: Regex detection for instant completions.
  • Adaptive AI-Driven Polling: AI specifies next_check_seconds instead of static backoff.
  • Edge Intelligence (Shift Left): Agent Node PTY reader actively scans output and fires prompt detection events to wake up sleeping SubAgents.

2. 🎭 Swarm Choreography (Resolved)

  • Branching Agency (EXECUTE Action): SubAgent natively handles cross-node orchestrations dynamically based on terminal state.

3. πŸ’Ύ PTY Memory Pressure & Logging (Resolved)

  • TaskJournal Head+Tail Bounds: 10KB Head + 30KB Tail max memory footprint for AI context.
  • Persistence Offloading: Nodes stream gigabyte outputs natively to temp disk files instead of expanding the application heap.
  • Client-Side Truncation: 15KB/sec rate limit on PTY execution to protect the Orchestrator's gRPC stream from being DDOSed.
  • Graceful Shutdown: SIGTERM/SIGINT grace periods for local node cleanup.
  • Ghost Mirror: Workspace bidirectional synchronization fully operational.

πŸš€ High Priority (Infrastructure & Scalability)

1. πŸ•ΈοΈ Hub GIL Contention & Scalability

Goal: Scale horizontally to handle 100+ simultaneous connected nodes.

  • [ ] Multiprocessing for Serialization: Refactor dispatch_swarm to use ProcessPoolExecutor to bypass GIL for heavy Protobuf signing/serialization.
  • [ ] Async gRPC Internalization: Migrate Orchestrator fully to grpc.aio to handle concurrent streaming and queue management without thread locks.
  • [ ] Sharded Registry: Distribute connections across multiple Hub instances using Redis as a shared state/journal layer.

2. 🌐 Comprehensive Browser Skill (Antigravity CDP)

Goal: Support a professional, high-fidelity browser interaction layer natively.

  • [ ] JS Console & Network Tunnels: Pipe console.log/error and XHR/Fetch traffic (HAR) back to the AI.
  • [ ] A11y Tree Perception: Provide the Accessibility Tree (JSON) to the AI instead of raw DOM for semantic control.
  • [ ] Advanced Interactions: Add Hover, Scroll, Drag & Drop, Multi-key injection, and EVAL javascript extraction capabilities.
  • [ ] Smart Wait Logic: wait_for_network_idle and custom predicates to eliminate cross-node browser task flakiness.

3. πŸ“¦ Binary Artifact & Large Data Chunking

Goal: Transmit massive payloads smoothly.

  • [ ] gRPC Chunking for Large Files: Enable streaming of high-def browser session videos or database dumps over the task channel without hitting message size limits.

4. 🏒 Multi-Tenancy & Resource Isolation

Goal: Security and fairness.

  • [ ] Tenant Segregation: Isolate node groups by tenant boundary, enforce hardware quotas, and allow the Hub to forcefully reap zombie tasks orphaned on nodes.

5. πŸ’Ύ Server-Side Task Persistence

Goal: Hub resilience.

  • [ ] DB Backend Integration: Migrate NodeRegistry and WorkPool states from in-memory dicts to Postgres/Redis. Deferred to full system integration.

🐒 Low Priority & Future Strategic Roadmap

[ ] Architectural Refinement: Unified Worker Shim

  • Move from a Python "Skill" abstraction to isolated background worker processes per task (a dedicated Shell process, a dedicated Playwright daemon) for better fault isolation.

[ ] OS-Level Isolation (Firecracker/microVMs)

  • Run arbitrary AI tasks in automated, ephemeral microVMs for strict security sandboxing, instead of bare metal or standard containers.

[ ] Advanced Scheduling & Capability Routing

  • Sophisticated task scheduler matching jobs against GPU limits, regions, or specific OS hardware constraints dynamically.

[ ] Infrastructure Automation

  • Node Auto-Updates: Secure execution pipeline for deployed agent nodes to pull and self-update their binaries/versions.
  • mTLS Lifecycle Management: Automated certificate renewal, revocation, and rotation.

[ ] Immutable Audit & Compliance

  • Cryptographically signed forensic logging for every TaskRequest and TaskResponse passing through the Hub.