Newer
Older
cortex-hub / docs / architecture / cortex_project_todo.md

πŸ“ Cortex Distributed Agent: Master Project Roadmap

This document consolidates all prioritized tasks, technical debt, swarm optimizations, and future strategic implementations for the Cortex project.


βœ… Completed Milestones (Core & Swarm Optimization)

1. πŸ•’ LLM Monitoring Latency (Resolved)

  • Fast-Path Pattern Heuristics: Regex detection for instant completions.
  • Adaptive AI-Driven Polling: AI specifies next_check_seconds instead of static backoff.
  • Edge Intelligence (Shift Left): Agent Node PTY reader actively scans output and fires prompt detection events to wake up sleeping SubAgents.

2. 🎭 Swarm Choreography (Resolved)

  • Branching Agency (EXECUTE Action): SubAgent natively handles cross-node orchestrations dynamically based on terminal state.

3. πŸ’Ύ PTY Memory Pressure & Logging (Resolved)

  • TaskJournal Head+Tail Bounds: 10KB Head + 30KB Tail max memory footprint for AI context.
  • Persistence Offloading: Nodes stream gigabyte outputs natively to temp disk files instead of expanding the application heap.
  • Client-Side Truncation: 15KB/sec rate limit on PTY execution to protect the Orchestrator's gRPC stream from being DDOSed.
  • Graceful Shutdown: SIGTERM/SIGINT grace periods for local node cleanup.
  • Ghost Mirror: Workspace bidirectional synchronization fully operational.

4. 🌐 Dedicated Browser Service (Resolved)

  • Service Decoupling: Browser automation moved to a standalone high-performance service.
  • Sidecar RAM Handoff: /dev/shm based zero-copy transfer for DOM and Screenshots.
  • Full Perception: Integrated A11y tree, JS evaluation, and persistent session management.

πŸš€ High Priority (Infrastructure & Scalability)

1. πŸ•ΈοΈ Hub GIL Contention & Scalability

Goal: Scale horizontally to handle 100+ simultaneous connected nodes.

  • [ ] Multiprocessing for Serialization: Refactor dispatch_swarm to use ProcessPoolExecutor to bypass GIL for heavy Protobuf signing/serialization.
  • [ ] Async gRPC Internalization: Migrate Orchestrator fully to grpc.aio to handle concurrent streaming and queue management without thread locks.
  • [ ] Sharded Registry: Distribute connections across multiple Hub instances using Redis as a shared state/journal layer.

3. πŸ“¦ Binary Artifact & Large Data Chunking

Goal: Transmit massive payloads smoothly.

  • [ ] gRPC Chunking for Large Files: Enable streaming of high-def browser session videos or database dumps over the task channel without hitting message size limits.

4. 🏒 Multi-Tenancy & Resource Isolation

Goal: Security and fairness.

  • [ ] Tenant Segregation: Isolate node groups by tenant boundary, enforce hardware quotas, and allow the Hub to forcefully reap zombie tasks orphaned on nodes.

5. πŸ’Ύ Server-Side Task Persistence

Goal: Hub resilience.

  • [ ] DB Backend Integration: Migrate NodeRegistry and WorkPool states from in-memory dicts to Postgres/Redis. Deferred to full system integration.

🐒 Low Priority & Future Strategic Roadmap

[ ] Architectural Refinement: Unified Worker Shim

  • Move from a Python "Skill" abstraction to isolated background worker processes per task (a dedicated Shell process, a dedicated Playwright daemon) for better fault isolation.

[ ] OS-Level Isolation (Firecracker/microVMs)

  • Run arbitrary AI tasks in automated, ephemeral microVMs for strict security sandboxing, instead of bare metal or standard containers.

[ ] Advanced Scheduling & Capability Routing

  • Sophisticated task scheduler matching jobs against GPU limits, regions, or specific OS hardware constraints dynamically.

[ ] Infrastructure Automation

  • Node Auto-Updates: Secure execution pipeline for deployed agent nodes to pull and self-update their binaries/versions.
  • mTLS Lifecycle Management: Automated certificate renewal, revocation, and rotation.

[ ] Immutable Audit & Compliance

  • Cryptographically signed forensic logging for every TaskRequest and TaskResponse passing through the Hub.