Newer
Older
cortex-hub / docs / features / agent_node_mesh.md

Feature Reference: Agent Node Mesh

This document serves as the comprehensive reference for the Agent Node Mesh UI in the Cortex platform. It describes the UI structure, design philosophy, user journeys, and technical implementation details to ensure functional stability during future iterations.


1. UI Overview & Layout Structure

The Agent Node Mesh page (NodesPage.js) is the central control plane for managing distributed execution environments. It follows a vertical stack layout with collapsible functional modules.

A. Dashboard Header (Top)

  • Position: Fixed at the top of the main content area.
  • Title: Agent Node Mesh with a rocket icon (🚀).
  • Description: Contextual message based on user role (Admin vs. User).
  • Global Actions:
    • Refresh List: Triggers a full REST fetch of node registry and group access.
    • Register Node (Admin Only): Opens a modal to create new node slugs and generate invite tokens.

B. Node Management Card (Repeatable Component)

Each registered node is represented by a high-density management card.

1. Identity & Live Pulse (Top-Left)

  • Pulsing Indicator: A real-time status light.
    • bg-green-500: Node is active and heartbeating via gRPC.
    • bg-gray-400: Node is offline or stale.
  • Status Text: Dynamic label (online, busy, idle, offline) derived from gRPC mesh status.
  • Node Name: Primary display name.
  • Node ID: Secondary monospaced identifier (e.g., test-prod-node).

2. Live Health Metrics (Integrated in Top Row)

  • CPU Meter: Gradient bar (Indigo-Blue) showing total host load.
    • Hover Tooltip: Displays Core count, Current Frequency (GHz), Per-core utilization distribution, and Load Averages (1/5/15 min).
  • RAM Meter: Gradient bar (Pink-Rose) showing memory utilization.
    • Hover Tooltip: Displays exact GB Used, Available capacity, and Total system memory.
  • Dynamic Visibility: Meters only appear when the node is online. If offline, a stable space of 140px is reserved to prevent UI shifting.
  • UI Interaction: Tooltips pop downwards (top-full) to ensure they are never clipped by the header navigation bar.
  • Philosophy: Instant "at-a-glance" deep health check without needing to open secondary dashboards.

3. State Control Toggle (Right Middle)

  • Active/Disabled Switch: A tactile, rounded toggle.
    • Active (Indigo): Node is allowed to accept tasks and sync files.
    • Disabled (Red): Node restricted from synchronization and command execution.

4. Action Utility Bar (Right)

  • Terminal Button (Amber/Indigo): Expands the PTY-based console.
  • File Navigator Button (Amber): Expands the directory explorer.
  • Settings & Details Button (Gray): Expands the administrative configuration panel.
  • Deregister Button (Red Trash): Permanently removes node from registry (Admin only).

2. Expanded Feature Panes

A. Interactive Console (NodeTerminal.js)

  • Philosophy: Persistent PTY (Pseudo-Terminal) session.
  • Features:
    • Real-time character-by-character streaming (ANSI support).
    • Latency Monitor: Displays RTT in milliseconds.
    • Zoom/Fullscreen: Toggle for complex shell tasks.
    • Debug Mode: Toggles visibility of background task events (start/stop/snapshot).
    • Clear: Wipes the xterm grid.

B. File System Navigator (FileSystemNavigator.js)

  • Philosophy: Lazy-loaded directory tree for massive file systems.
  • Features:
    • Directory Expansion: Clicking a folder fetches its immediate children via gRPC LIST.
    • File View: Opens a modal with a monospaced code viewer (prevents binary file viewing).
    • Sync Indicators: Amber dots indicate if a file is currently synced with the server's Ghost Mirror.
    • Operations: Create File/Folder, Delete, and Breadcrumb navigation.

C. Admin Settings Pane (Inline)

  • Identity Details: Displays registered description.
  • Skill Configuration: Toggles for Shell, Browser, and Sync logic.
  • Group Access Management: Map specific user groups to this node with 'use' or 'root' permissions.
  • Download Bundle: Generates the agent_config.yaml and installation package.

3. Execution Live Bus (Bottom Overlay)

  • Position: Bottom of the page, visible when events occur.
  • Content: A streaming timeline of all mesh events (task_start, task_complete, sync_progress).
  • Philosophy: Providing a "Global Pulse" of what the AI is doing across the entire fleet of nodes.

4. User Journey & Steps

Path 1: Node Registration (Admin)

  1. Click Register Node in header.
  2. Enter Slug (e.g. gpu-worker-1) and Display Name.
  3. Expanded Settings on the new node card.
  4. Click Download Configuration.
  5. Run the installation script on the physical machine using the generated YAML.

Path 2: Live Debugging (User/Admin)

  1. Locate the target node in the mesh list.
  2. Verify Live Pulse is green.
  3. Click the Terminal icon.
  4. Execute ls -la or top to inspect environment.
  5. Click File Navigator to locate artifacts or logs.

5. Source Code Mapping

Component UI Purpose Source Path
NodesPage.js Main Orchestrator View & Mesh Logic NodesPage.js
NodeHealthMetrics Real-time Rich CPU/RAM Metrics NodesPage.js (L140)
NodeTerminal.js PTY-bound Xterm.js terminal NodeTerminal.js
FileSystemNavigator.js File/Folder Explorer FileSystemNavigator.js
NodeRegistryService Backend Node Tracking node_registry.py
AgentNodeStats Pydantic Rich Metric Schema schemas.py
TaskAssistant gRPC Dispatcher Brain assistant.py

6. Feature Completion & Verification

To ensure a feature is fully operational and stable, follow this mandatory verification path:

1. Proto Regeneration & Commit

  • CRITICAL: If agent.proto has changed, you MUST regenerate the Python stubs for both the Hub and the Agent BEFORE deploying. Failure to do so will cause communication deadlocks or fields like task_id to be missed.
    # Hub
    cd /app/ai-hub/app/protos && python3 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. agent.proto
    # Agent
    cd /app/agent-node && python3 -m grpc_tools.protoc -Iprotos --python_out=. --grpc_python_out=. protos/agent.proto
  • Step: Commit the .proto changes AND the newly generated _pb2.py files.
  • Action: Push to production:
    bash /app/deploy_remote.sh

2. Production Health Check (The "Python Trick")

Once deployed, perform a basic connectivity check directly inside the production container. This bypasses potential frontend/network issues to isolate the backend logic.

  • SSH into the host and run a targeted Python script inside the ai_hub_service container:
    docker exec ai_hub_service python3 -c "
    import requests
    # Replace with the user_id that owns the node
    headers = {'X-User-ID': '9a333ccd-9c3f-432f-a030-7b1e1284a436'}
    r = requests.get('http://localhost:8000/api/v1/nodes/test-prod-node/fs/ls?path=.', headers=headers)
    print(f'Status: {r.status_code}')
    print(r.text)
    "
  • Expected Outcome: A 200 OK status and a JSON body containing the file list.

3. Visual Verification (Worst Case)

If the automated checks pass but UI behavior is suspect, use the Browser Subagent workflow (/browser_setup) to capture screenshots of the live environment at https://ai.jerxie.com/nodes.

[!IMPORTANT] Subagent Interaction Efficiency: If you need to test the UI, you MUST map out the exact step-by-step navigation path (which icons to click, which buttons to press) before calling the tool. Since the subagent cannot read the source code or component structure, it relies purely on visual discovery, which is highly inefficient without explicit step-by-step guidance.


7. Guidelines for Future Changes