name: browser_automation_agent emoji: "🌐" description: > Perform web browsing, data extraction, form filling, and UI automation on remote agent nodes using Playwright. Supports persistent browser sessions with stateful element refs (e1, e2, ...) for reliable multi-step interaction. skill_type: remote_grpc is_enabled: true features:

chat

swarm_control config: service: BrowserService method: Navigate capabilities:

browser parameters: type: object properties: url:

type: string
description: The URL to navigate to (required for 'navigate' action).

action:

type: string
enum:
- navigate
- click
- type
- screenshot
- get_dom
- hover
- scroll
- eval
- get_a11y
- close
description: |
  The browser action to perform:
  - navigate: Go to a URL. Auto-returns an aria snapshot for immediate context.
  - get_a11y: Get a semantic role tree of the page with [ref=eN] labels. Use this to understand the page and get selectors for interactive elements.
  - click: Click a selector or ref (e.g. 'e3').
  - type: Type text into a selector or ref.
  - screenshot: Capture a PNG screenshot.
  - eval: Execute JavaScript on the page and return the result.
  - get_dom: Get the full HTML source.
  - scroll: Scroll vertically by 'y' pixels.
  - hover: Hover over a selector or ref.
  - close: Close the browser session.

selector:

type: string
description: >
  CSS/XPath selector OR a ref from the last snapshot (e.g. 'e3').
  Refs are more reliable than CSS selectors — always prefer refs after get_a11y.

text:

type: string
description: Text to type (for 'type' action) or JavaScript to execute (for 'eval' action).

type: integer
description: Pixels to scroll vertically (for 'scroll' action, default 400).

node_id:

type: string
description: The target node ID.

session_id:

type: string
description: >
  Session ID for persistent browser state. Use a consistent ID across multiple
  actions to maintain cookies, login state, and element refs.

required:

action
node_id
is_system: true

Browser Automation Agent

You are an AI browsing and data extraction assistant using Playwright on a remote agent node.

Recommended Workflow (ALWAYS follow this pattern)

Step 1: Navigate

Use navigate to go to a URL. This automatically returns an accessibility snapshot.

Step 2: Understand the page with `get_a11y`

Run get_a11y to get a semantic role tree of the page. Each interactive or content element gets a stable [ref=eN] label:

- heading "Top Stories" [ref=e1]
- link "OpenAI releases new model" [ref=e2]
- searchbox "Search" [ref=e3]
- button "Submit" [ref=e4]

Step 3: Interact using refs

Use the refs directly as a selector value for click, type, or hover:

To click "Submit": { "action": "click", "selector": "e4" }
To type a query: { "action": "type", "selector": "e3", "text": "AI news" }

Extracting Information

Use eval with JavaScript for targeted data extraction:
- document.title
- [...document.querySelectorAll('h2')].map(e=>e.innerText).join('\n')
- document.body.innerText (for clean text without HTML)
Use get_a11y for structured listings of links, headings, buttons.

Session Persistence

Always use the same session_id across steps to preserve cookies, login state, and element refs.

is_system: true