Newer
Older
cortex-hub / docs / refactors / dedicated_browser_service.md

Design Document: Dedicated Browser Service Refactor

1. Rationale

Currently, every agent node in the Cortex Mesh can optionally support a "browser skill" via Playwright. While flexible, this introduces several issues:

  • Latency: High overhead in sending large DOM/A11Y snapshots over the bi-directional gRPC TaskStream.
  • Resource Heavy: Browser instances on edge nodes consume significant RAM/CPU.
  • Dependency Bloat: Every agent node needs Playwright/Chromium dependencies.
  • Complexity: Synchronizing browser state across a distributed mesh is difficult.

By moving browser automation to a dedicated service located alongside the AI Hub, we achieve near-zero latency for DOM extraction and more robust state management.

2. New Architecture

The new architecture introduces a standalone Browser Service container.

Component Diagram

graph TD
    User["USER / UI"] <--> Hub["AI Hub (ai-hub)"]
    
    subgraph "Internal Processing"
        Hub --> SA["Sub-Agent (Browser Expert)"]
        SA -- "gRPC / REST" --> BSC["Browser Service Client"]
    end

    BSC <--> BS["Dedicated Browser Service (Container)"]
    
    subgraph "Browser Service"
        BS <--> PW["Playwright / Chromium"]
    end

    Hub <--> Mesh["Agent Mesh (Edge Nodes)"]
    Mesh -- "Local Tasks" --> Shell["Shell / File System"]

Key Changes

  1. Hub Integration: The ToolService in the Hub will no longer dispatch browser tasks to the TaskAssistant (which routes to random nodes). Instead, it will talk directly to the BrowserService client.
  2. Stateless/Stateful Support: The BrowserService will manage browser contexts, allowing the Hub to reference a session_id to continue navigation on the same page.
  3. Removal from Agents: All agent-node code related to the browser (bridges, dependencies, and proto fields) will be removed.

3. Refactoring Plan

Phase 1: Protos & Infrastructure

  1. Update agent.proto:
    • Remove BrowserAction, BrowserResponse, and BrowserEvent messages.
    • Remove browser_action from TaskRequest.
    • Remove browser_result and browser_event from ServerTaskMessage.
  2. Define Browser Service API:
    • Create a new proto (e.g., browser_service.proto) defining actions like Navigate, Click, Extract, EvaluateJS.

Phase 2: Agent Node Cleanup

  1. Remove Skill Implementation:
    • Delete agent-node/src/agent_node/skills/browser_bridge.py.
  2. Update Manager:
    • Remove browser registration in agent-node/src/agent_node/skills/manager.py.
  3. Core Cleanup:
    • Remove browser capability detection in agent-node/src/agent_node/node.py.
    • Remove inbound browser_action routing in _process_server_message.

Phase 3: AI Hub Refactor

  1. Tool Routing:
    • In ai-hub/app/core/services/tool.py, update browser_automation_agent to use a BrowserServiceClient instead of assistant.dispatch_browser.
  2. Assistant Cleanup:
    • Remove dispatch_browser and all browser-related result handling from ai-hub/app/core/grpc/services/assistant.py.
  3. GRPC Server Cleanup:
    • Remove browser_event and result correlation logic from ai-hub/app/core/grpc/services/grpc_server.py.

Phase 4: New Browser Service Development

  1. Implement a new Python service using FastAPI or gRPC.
  2. Use Playwright with a pool of persistent contexts.
  3. Deploy as a separate container in the docker-compose.

4. Performance Analysis & Optimization

To achieve "performance first" and "0 latency," we must choose the communication stack carefully.

Comparison

Feature gRPC (HTTP/2) REST (HTTP/1.1) gRPC + Unix Sockets Shared Memory (/dev/shm)
Serialization Protobuf (Binary) JSON (Text) Protobuf (Binary) Zero-copy / Reference
Network Overhead Low (TCP) High (TCP) Near Zero (IPC) Zero
Speed (Small Result) High Medium Ultra High N/A
Speed (Large DOM/A11Y) Medium Low High Instant

The "Performance First" Recommendation

For a local container-to-container deployment, gRPC over Unix Domain Sockets (UDS) is the optimal choice for command/control. However, for large data (DOM snapshots, Screenshots), we will implement a Sidecar Handoff via Shared Memory.

  1. Control Path: AI Hub --[gRPC over UDS]--> Browser Service.
  2. Data Path (Large Blobs):
    • Browser Service writes the 2MB DOM or 5MB Screenshot to a shared volume mounted as tmpfs (e.g., /dev/shm/cortex_browser/).
    • Browser Service returns the file path reference via gRPC.
    • AI Hub reads the file directly from RAM.
    • This bypasses the serialization/deserialization and stream-processing overhead of passing MBs of data through the network stack.

5. Implementation Roadmap Update

Phase 1: Shared Infrastructure

  • Configure docker-compose.production.yml to shared a high-speed RAM volume (/dev/shm) between the ai-hub and the new browser-service.
  • Implement a gRPC server on the Browser Service that listens on a Unix Socket (/tmp/browser.sock).

Phase 2: Agent Node Cleanup (Continued)

(Same as previously defined - removing all browser skill code from nodes to lighten their footprint).

Phase 3: Hub Logic

  • Implement the BrowserServiceClient to handle the UDS connection and the RAM-disk data retrieval for large snapshots.

6. Impact Analysis

  • Latency: Estimated 95% reduction in large data transfer time.
  • CPU/Memory: Drastically reduced on agents; focused on one optimized high-memory container on the Hub host.
  • Architecture: Cleaner separation of concerns. Agents handle hardware/local tasks; specialized containers handle high-resource simulated tasks.

Status: DRAFT
Author: Cortex Architect
Ref: Session Refactor Request (Turn 818)