Newer
Older
cortex-hub / skills / browser-automation-agent / SKILL.md

name: browser_automation_agent emoji: "🌐" description: > Perform web browsing, real-time data retrieval (stocks, news, weather), extraction, and UI automation using a dedicated High-Performance Browser Service. Use this tool whenever you need up-to-date information that might be beyond your knowledge cutoff. Supports persistent browser sessions with stateful element refs (e1, e2, ...) for reliable multi-step interaction. skill_type: system is_enabled: true features:

  • swarm_control
  • agent_harness config: service: BrowserService method: Navigate parameters: type: object properties:
    url:
      type: string
      description: The URL to navigate to (required for 'navigate' action).
    action:
      type: string
      enum:
      - navigate
      - click
      - type
      - screenshot
      - snapshot
      - hover
      - scroll
      - eval
      - close
      - research
      description: |
        The browser action to perform:
        - navigate: Go to a URL. Auto-returns an aria snapshot for immediate context.
        - snapshot: Get a semantic `a11y_summary` and structured DOM of the current page.
        - click: Click a selector or ref (e.g. 'e3').
        - type: Type text into a selector or ref.
        - screenshot: Capture a PNG screenshot.
        - eval: Execute JavaScript on the page and return the result.
        - scroll: Scroll vertically by 'y' pixels.
        - hover: Hover over a selector or ref.
        - close: Close the browser session.
        - research: (DEEP BREADTH SEARCH) Dispatch multiple URLs to a worker pool. Returns clean markdown of each page.
    selector:
      type: string
      description: >
        CSS/XPath selector OR a ref from the last snapshot (e.g. 'e3').
        Refs are more reliable than CSS selectors — always prefer refs after a snapshot.
    text:
      type: string
      description: Text to type (for 'type' action) or JavaScript to execute (for 'eval' action).
    y:
      type: integer
      description: Pixels to scroll vertically (for 'scroll' action, default 400).
    session_id:
      type: string
      description: Optional override. Use the same session_id across multiple actions to maintain cookies, login state, and element refs.
    urls:
      type: array
      items:
        type: string
      description: List of URLs for the 'research' action.
    max_concurrent:
      type: integer
      description: Max parallel workers for research (default 5).
    required:
    • action

      is_system: true

Browser Automation Agent

This capability enables high-performance web browsing, data extraction, and UI automation. It supports persistent sessions with stable element references (e1, e2, etc.) for reliable multi-step interaction across the web.

[!TIP] CAPTCHA & STEALTH HANDLING If you encounter a CAPTCHA (e.g., reCAPTCHA, Cloudflare), execute a snapshot. You should see the "I'm not a robot" button or verify checkbox as a reference (e.g., [ref=e15]). Simply click that ref to proceed. The agent utilizes advanced multi-frame scanning for these scenarios.

Intelligence Protocol

You are the Lead Browsing & Extraction Specialist. Adhere to these principles for professional data harvesting:

1. Reliable Interaction (The Snap-Ref Pattern)

Always follow this three-step workflow to ensure interaction stability:

  1. Navigate: Go to the target URL.
  2. Snapshot: Run snapshot to retrieve a semantic role tree. This generates stable labels (e.g., [ref=e1]) for all interactive elements.
  3. Interact: Use the refs directly as the selector (e.g., "selector": "e4") for all click, type, or hover actions. Prefer refs over CSS selectors, as they are resilient to page updates within a session.

2. Extraction Strategy

  • Semantic Summary: Use a11y_summary to understand the primary content and navigation structure.
  • Deep Extraction: Utilize the eval action to execute targeted JavaScript for structured data (e.g., document.title, lists, or specific element properties).
  • Markdown Conversion: When possible, use the Research worker pool for high-volume content extraction.

3. Session & Token Management

[!IMPORTANT] Session Persistence Always use a consistent session_id throughout a multi-step workflow. This preserves cookies, login states, and element references across turns.

4. 🚀 High-Volume Research (Worker Pool)

When analyzing multiple search results or deep-diving into subpages, utilize the research action to process URLs in parallel:

  • Workflow:
    1. Extract a list of URLs from a search results page using snapshot + eval.
    2. Invoke research with the urls array.
    3. The tool returns a list of results, each containing a clean Markdown version of the main content, allowing you to process 5+ pages in a single turn.