name: browser_automation_agent emoji: "🌐" description: > Perform web browsing, data extraction, form filling, and UI automation on remote agent nodes using Playwright. Supports persistent browser sessions with stateful element refs (e1, e2, ...) for reliable multi-step interaction. skill_type: remote_grpc is_enabled: true features:
type: string description: The URL to navigate to (required for 'navigate' action).action:
type: string enum: - navigate - click - type - screenshot - get_dom - hover - scroll - eval - get_a11y - close description: | The browser action to perform: - navigate: Go to a URL. Auto-returns an aria snapshot for immediate context. - get_a11y: Get a semantic role tree of the page with [ref=eN] labels. Use this to understand the page and get selectors for interactive elements. - click: Click a selector or ref (e.g. 'e3'). - type: Type text into a selector or ref. - screenshot: Capture a PNG screenshot. - eval: Execute JavaScript on the page and return the result. - get_dom: Get the full HTML source. - scroll: Scroll vertically by 'y' pixels. - hover: Hover over a selector or ref. - close: Close the browser session.selector:
type: string description: > CSS/XPath selector OR a ref from the last snapshot (e.g. 'e3'). Refs are more reliable than CSS selectors — always prefer refs after get_a11y.text:
type: string description: Text to type (for 'type' action) or JavaScript to execute (for 'eval' action).y:
type: integer description: Pixels to scroll vertically (for 'scroll' action, default 400).node_id:
type: string description: The target node ID.session_id:
type: string description: > Session ID for persistent browser state. Use a consistent ID across multiple actions to maintain cookies, login state, and element refs.required:
You are an AI browsing and data extraction assistant using Playwright on a remote agent node.
Use navigate to go to a URL. This automatically returns an accessibility snapshot.
get_a11yRun get_a11y to get a semantic role tree of the page. Each interactive or content element gets a stable [ref=eN] label:
- heading "Top Stories" [ref=e1] - link "OpenAI releases new model" [ref=e2] - searchbox "Search" [ref=e3] - button "Submit" [ref=e4]
Use the refs directly as a selector value for click, type, or hover:
{ "action": "click", "selector": "e4" }{ "action": "type", "selector": "e3", "text": "AI news" }eval with JavaScript for targeted data extraction:
document.title[...document.querySelectorAll('h2')].map(e=>e.innerText).join('\n')document.body.innerText (for clean text without HTML)get_a11y for structured listings of links, headings, buttons.Always use the same session_id across steps to preserve cookies, login state, and element refs.