Newer
Older
cortex-hub / skills / browser-automation-agent / SKILL.md

name: browser_automation_agent emoji: "🌐" description: > Perform web browsing, data extraction, form filling, and UI automation on remote agent nodes using Playwright. Supports persistent browser sessions with stateful element refs (e1, e2, ...) for reliable multi-step interaction. skill_type: remote_grpc is_enabled: true features:

  • chat
  • swarm_control config: service: BrowserService method: Navigate capabilities:
    • browser parameters: type: object properties: url:
      type: string
      description: The URL to navigate to (required for 'navigate' action).
      action:
      type: string
      enum:
      - navigate
      - click
      - type
      - screenshot
      - get_dom
      - hover
      - scroll
      - eval
      - get_a11y
      - close
      description: |
        The browser action to perform:
        - navigate: Go to a URL. Auto-returns an aria snapshot for immediate context.
        - get_a11y: Get a semantic role tree of the page with [ref=eN] labels. Use this to understand the page and get selectors for interactive elements.
        - click: Click a selector or ref (e.g. 'e3').
        - type: Type text into a selector or ref.
        - screenshot: Capture a PNG screenshot.
        - eval: Execute JavaScript on the page and return the result.
        - get_dom: Get the full HTML source.
        - scroll: Scroll vertically by 'y' pixels.
        - hover: Hover over a selector or ref.
        - close: Close the browser session.
      selector:
      type: string
      description: >
        CSS/XPath selector OR a ref from the last snapshot (e.g. 'e3').
        Refs are more reliable than CSS selectors — always prefer refs after get_a11y.
      text:
      type: string
      description: Text to type (for 'type' action) or JavaScript to execute (for 'eval' action).
      y:
      type: integer
      description: Pixels to scroll vertically (for 'scroll' action, default 400).
      node_id:
      type: string
      description: The target node ID.
      session_id:
      type: string
      description: >
        Session ID for persistent browser state. Use a consistent ID across multiple
        actions to maintain cookies, login state, and element refs.
      required:
      • action
      • node_id

        is_system: true

Browser Automation Agent

You are an AI browsing and data extraction assistant using Playwright on a remote agent node.

Step 1: Navigate

Use navigate to go to a URL. This automatically returns an accessibility snapshot.

Step 2: Understand the page with get_a11y

Run get_a11y to get a semantic role tree of the page. Each interactive or content element gets a stable [ref=eN] label:

- heading "Top Stories" [ref=e1]
- link "OpenAI releases new model" [ref=e2]
- searchbox "Search" [ref=e3]
- button "Submit" [ref=e4]

Step 3: Interact using refs

Use the refs directly as a selector value for click, type, or hover:

  • To click "Submit": { "action": "click", "selector": "e4" }
  • To type a query: { "action": "type", "selector": "e3", "text": "AI news" }

Extracting Information

  • Use eval with JavaScript for targeted data extraction:
    • document.title
    • [...document.querySelectorAll('h2')].map(e=>e.innerText).join('\n')
    • document.body.innerText (for clean text without HTML)
  • Use get_a11y for structured listings of links, headings, buttons.

Session Persistence

Always use the same session_id across steps to preserve cookies, login state, and element refs.