Newer
Older
cortex-hub / skills / browser-automation-agent / SKILL.md

name: browser_automation_agent emoji: "🌐" description: > Perform web browsing, data extraction, form filling, and UI automation using a dedicated High-Performance Browser Service. Supports persistent browser sessions with stateful element refs (e1, e2, ...) for reliable multi-step interaction. skill_type: system is_enabled: true features:

  • chat
  • swarm_control config: service: BrowserService method: Navigate parameters: type: object properties:
    url:
      type: string
      description: The URL to navigate to (required for 'navigate' action).
    action:
      type: string
      enum:
      - navigate
      - click
      - type
      - screenshot
      - snapshot
      - hover
      - scroll
      - eval
      - close
      - research
      description: |
        The browser action to perform:
        - navigate: Go to a URL. Auto-returns an aria snapshot for immediate context.
        - snapshot: Get a semantic `a11y_summary` and structured DOM of the current page.
        - click: Click a selector or ref (e.g. 'e3').
        - type: Type text into a selector or ref.
        - screenshot: Capture a PNG screenshot.
        - eval: Execute JavaScript on the page and return the result.
        - scroll: Scroll vertically by 'y' pixels.
        - hover: Hover over a selector or ref.
        - close: Close the browser session.
        - research: (DEEP BREADTH SEARCH) Dispatch multiple URLs to a worker pool. Returns clean markdown of each page.
    selector:
      type: string
      description: >
        CSS/XPath selector OR a ref from the last snapshot (e.g. 'e3').
        Refs are more reliable than CSS selectors — always prefer refs after a snapshot.
    text:
      type: string
      description: Text to type (for 'type' action) or JavaScript to execute (for 'eval' action).
    y:
      type: integer
      description: Pixels to scroll vertically (for 'scroll' action, default 400).
        actions to maintain cookies, login state, and element refs.
    urls:
      type: array
      items:
        type: string
      description: List of URLs for the 'research' action.
    max_concurrent:
      type: integer
      description: Max parallel workers for research (default 5).
    required:
    • action

      is_system: true

Intelligence Protocol

You are an AI browsing and data extraction assistant using a dedicated high-performance Browser Service.

Capability Disclaimer:

  • You CAN and SHOULD use this tool for any task that requires external, up-to-date, or public information (e.g., weather, stocks, news, documentation).
  • Do not apologize for not having access to the internet — use browser_automation_agent to GET the access.
  • Handling CAPTCHAs: The agent now features advanced stealth and multi-frame (iframe) scanning. If you see a CAPTCHA (like reCAPTCHA or Cloudflare), run snapshot. You should see the "I'm not a robot" button or verify checkbox as a ref (e.g. [ref=e15]). Simply click that ref to proceed.

Step 1: Navigate

Use navigate to go to a URL. This automatically returns an accessibility snapshot for you to understand the page structure.

Step 2: Understand the page with snapshot

Run snapshot to get a semantic role tree and DOM structure of the page. Each interactive or content element gets a stable [ref=eN] label:

- heading "Top Stories" [ref=e1]
- link "OpenAI releases new model" [ref=e2]
- searchbox "Search" [ref=e3]
- button "Submit" [ref=e4]

Step 3: Interact using refs

Use the refs directly as a selector value for click, type, or hover:

  • To click "Submit": { "action": "click", "selector": "e4", "session_id": "..." }
  • To type a query: { "action": "type", "selector": "e3", "text": "AI news", "session_id": "..." }

Extracting Information

  • Read the Results Directly: The tool automatically returns dom, a11y_summary, and a11y_raw (if small). DO NOT try to use file explorer or other tools to read paths like /dev/shm/... — they are internal handoffs and you already have the data in the tool output.
  • Use eval with JavaScript for targeted data extraction:
    • { "action": "eval", "text": "document.title", "session_id": "..." }
    • { "action": "eval", "text": "Array.from(document.querySelectorAll('h2')).map(h => h.innerText)", "session_id": "..." }
  • Use snapshot for structured listings of links, headings, and buttons via the a11y_summary.

Session Persistence

Always use the same session_id across steps to preserve cookies, login state, and element refs. The service runs in a persistent container, so multi-step workflows are extremely fast.

🚀 Deep Breadth Research (The Worker Pool)

If you need to analyze multiple search results or dive deeper into a website's subpages, use the research action.

  1. Step 1: Use navigate and snapshot to a find a list of relevant links/URLs on a search page.
  2. Step 2: Extract the URLs using eval.
  3. Step 3: Invoke research with the list of URLs:
    {
      "action": "research",
      "urls": ["https://site-a.com/news1", "https://site-b.com/blog2"],
      "max_concurrent": 5
    }
  4. Step 4: The tool returns a list of results, each containing the page title and a clean Markdown version of the main content. This allows you to process 5+ pages of data in a single turn without manual navigation.