Skip to content

WAAA! Web Adversaries Against Agentic Browsers

Source: arXiv:2605.05509 · Published 2026-05-06 · By Sohom Datta, Alex Nahapetyan, William Enck, Alexandros Kapravelos

TL;DR

This paper addresses a critical blind spot in agentic browser security research: while prior work focused exclusively on indirect prompt injection (explicit adversarial instructions embedded in page content), no one had systematically examined how traditional web attacks — phishing, UI deception, cross-site data leaks, social engineering — manifest when the user is replaced by an LLM agent. The authors argue that the LLM agent is a 'confused deputy': it holds user-level privileges but cannot distinguish legitimate page components from adversarially crafted ones, and critically, it has no model of attacker intent or origin-awareness. This makes the entire legacy playbook of web social engineering newly exploitable, often in easier or amplified forms.

The paper's core contribution is a formal threat model extending the See→Act agentic browser model with web security primitives (origins, gadget vs. script attacker capability, browser capability tiers) and a derived taxonomy of 20 attacks across same-site exfiltration, cross-site leaks, slop-squatting hallucinations, LLM-specific attacks, and integration/system-level abuse. Of these, 18 have proof-of-concept implementations tested against Playwright-MCP and BrowserOS. A generalizability study spans 14 attacks across 4 major LLM backends (models from Anthropic, OpenAI, Alibaba, and at least one other vendor), finding that existing alignment and safety measures — including Perplexity's BrowseSafe prompt-injection detection model — are insufficient to block the breadth of attacks in the taxonomy.

The results are organized into 5 failure modes: agents bridging cross-site data, agents bridging same-site data, agents hallucinating URLs (slop-squatting), websites directly attacking the LLM, and agents misusing integrated tools. The authors argue that these failure modes are architectural rather than model-level, meaning safety fine-tuning cannot fix them without rearchitecting agentic browsers around an origin-aware, capability-constrained security model. Notably, the paper demonstrates that markup-only (gadget-level) attackers — who cannot run JavaScript — can induce behaviors previously requiring script-level access, effectively collapsing the classical gadget/script attacker distinction in the agentic setting.

Key findings

  • 18 of 20 taxonomy attacks were successfully implemented as proof-of-concept exploits against at least one of two tested agentic browser systems: Playwright-MCP and unmodified BrowserOS.
  • 14 of 20 attacks were tested in a generalizability study across 4 major LLM models from multiple vendors (including Anthropic and OpenAI), and all 14 reproduced successfully, demonstrating that vulnerabilities are systemic rather than model-specific.
  • 10 of 20 attacks directly re-emerge from classical, heavily-mitigated web threats (e.g., XSS, CSRF token leakage, cross-origin data exfiltration), and appear in amplified or easier-to-exploit forms in the agentic setting because no JavaScript is needed — markup alone suffices.
  • Real-world attack variants defeated Perplexity AI's BrowseSafe, a state-of-the-art prompt-injection detection model, demonstrating that confusion attacks evade defenses designed for indirect prompt injection.
  • Agentic browsers collapse the classical distinction between gadget attackers (markup-only) and script attackers (JavaScript execution): a markup-level adversary can induce behaviors equivalent to script-level capabilities by exploiting the agent's action repertoire.
  • The paper identifies 5 broad failure modes: (1) agents bridge cross-site data, (2) agents bridge same-site data, (3) agents hallucinate URLs (slop-squatting, attacks SQ-1 and SQ-2), (4) websites attack the LLM itself (LLM-1 prompt leakage, LLM-2 fingerprinting), and (5) agents misuse integrated tools (I-3 through I-5 covering service jacking, file read/write, and remote code execution).
  • A capability analysis of 11 existing agentic browsers informed the four-tier browser capability model (Page Interaction → Navigation → Browser UI → Integration), showing that attack surface scales with capability tier and that removing capabilities eliminates entire attack classes (e.g., no Navigation capability precludes all slop-squatting and UXSS attacks).
  • Two attacks (I-1 Permission abuse and I-2 Drive-by extension install) could not be reduced to proof-of-concept, indicating these represent the outer boundary of current exploitability but are included in the taxonomy due to theoretical plausibility under the threat model.

Threat model

The adversary is a traditional web attacker who operates within the conventions of the web ecosystem. Two capability tiers are modeled: a gadget attacker who controls only limited HTML/CSS markup regions (e.g., user-generated content in comment fields, forum posts, email bodies) without JavaScript execution, and a web attacker who controls a full origin including JavaScript execution (e.g., a malicious website the agent visits, a compromised third-party script). The attacker has three possible target classes: same-site data (sensitive content co-located on the page), cross-site data (data from other origins the agent may visit), and the user's system (filesystem, permissions, installed extensions via integration-tier capabilities). The attacker cannot: poison the LLM's training data, compromise the browser's certificate trust store, exploit memory-safety bugs in the browser or server, or issue instructions directly to the user. The model explicitly excludes MITM/network attackers as out of scope given TLS prevalence. The key adversarial advantage is full control over page state at the attacker's origin, enabling iterative reshaping of the browser's inputs across the agent's feedback loop — the attacker wins by exploiting the agent's inability to distinguish legitimate page workflow steps from adversarially crafted ones.

Methodology — deep read

The threat model construction begins from two existing models: the See→Act agentic browser model (Zhang et al.) and the web attacker model (Akhawe et al.). The authors extend the See→Act formalism by adding origin tracking (o_t), an explicit attacker entity (M) with controlled elements (E_t), and a four-tier action taxonomy (page interaction a_p, navigation a_n, browser UI a_b, external integration a_x). The browser state is redefined as s_t = {h_t, i_t, o_t, E_t} where h_t is markup, i_t is screenshot data, o_t is current origin, and E_t is the set of elements under attacker control. Attacker capabilities are bifurcated into markup injection (gadget attacker, analogous to untrusted user-generated content like comments) and script-level injection (web attacker, JavaScript execution). Network attackers (MITM) are explicitly scoped out because TLS adoption renders them impractical without pre-existing host compromise. Trust assumptions include: the LLM is not training-data-poisoned, the user is non-malicious, the underlying browser has no memory-safety bugs, and the attacker has only capabilities available to normal JavaScript execution and rendered HTML.

Taxonomy derivation followed a structured enumeration process. The authors enumerated all 60 possible combinations of attacker capability (2 levels) × browser capability (4 tiers) × attacker target (3 types: same-site data, cross-site data, system). Two authors with web security expertise independently reviewed each combination, excluded 20 as physically impossible or requiring overprivileged attacker assumptions, and mapped remaining combinations to prior web security literature (drawing on NDSS, CCS, USENIX Security, IEEE S&P 2021–2025). This yielded 38 feasible combinations that collapsed via deduplication into 20 distinct attacks, 10 of which have direct analogs in the web security literature and 10 of which are novel to the agentic setting.

Proof-of-concept implementation covered 18 of 20 attacks (I-1 and I-2 were not implemented). The primary test targets were: (1) a Playwright-MCP-based agentic system — an open-source, extensible agentic browser using the Model Context Protocol — and (2) BrowserOS, a commercial agentic browser product. Attack success was evaluated qualitatively (binary: did the agent perform the attacker-intended action?). The paper does not report quantitative success rates or confidence intervals for individual PoC tests; it is unclear whether multiple independent runs were conducted per attack or whether a single successful demonstration was counted as a pass.

The generalizability study tested 14 of the 20 attacks across 4 LLM backends from multiple vendors, including models from Anthropic (Claude series), OpenAI (GPT series), and Alibaba (likely Qwen), plus at least one additional vendor. The goal was to determine whether vulnerabilities were model-specific artifacts or systemic. The paper reports that all 14 attacks reproduced across all tested models, with existing alignment from providers being insufficient to block the confusion attack class. The specific LLM versions, system prompts, and whether each model was evaluated identically or with model-specific configurations are not detailed in the truncated text provided — this is a reproducibility gap.

A concrete end-to-end example: for SS-1 (Same-site data exfiltration), a gadget attacker (e.g., controlling a comment field on a forum) embeds content that visually or textually mimics a legitimate page instruction. When the agentic browser processes the full page to fulfill a user task (e.g., 'summarize this thread'), the agent, lacking origin-awareness of the comment's untrusted provenance, reads and relays sensitive co-located content (e.g., a CSRF token or private user data visible on the same page) to the attacker's endpoint or includes it in its response. No JavaScript is needed — the confusion arises from the agent's inability to distinguish trusted developer content from attacker-injected gadgets in the same DOM.

Reproducibility: The paper provides a GitHub-style arXiv submission for inspection, but the truncated text does not confirm a public code release. PoC details appear sufficient for expert reproduction but the BrowserOS evaluation depends on a commercial product, and the specific LLM API versions tested are not pinned in the excerpt. Statistical testing (e.g., significance tests on generalizability results) is not mentioned.

Technical innovations

  • First formal threat model for agentic browsers grounded in web security primitives, extending See→Act with explicit origin tracking, a four-tier browser capability model, and a confused-deputy framing — prior work (WASP, DoomArena) modeled security only as input sanitization without origin or capability dimensions.
  • Introduction of the 'confusion attack' category as distinct from indirect prompt injection: confusion attacks exploit standard web interaction conventions (phishing, UI mimicry, social engineering) rather than explicit adversarial instructions, making input sanitization provably insufficient as a general defense.
  • Demonstration that markup-only (gadget-level) attackers can achieve behaviors previously requiring script-level capabilities in traditional web security, because the agent's action repertoire serves as an amplifier — collapsing the classical gadget/script attacker distinction.
  • A two-axis taxonomy derivation methodology (attacker capability × browser capability × attacker target) that systematically enumerates the attack surface and provides a reusable framework for evaluating future agentic browser designs, rather than an ad hoc list of attacks.
  • Empirical demonstration that confusion attacks evade Perplexity AI's BrowseSafe, a deployed prompt-injection detection system, showing that defenses designed for the indirect-prompt-injection threat model do not generalize to the confusion attack class.

Datasets

  • No standard ML datasets used — evaluation is based on manually constructed proof-of-concept attack pages tested against live agentic browser systems (Playwright-MCP and BrowserOS). No public dataset released (based on available text).

Baselines vs proposed

  • Perplexity BrowseSafe (prompt-injection detection model): blocked indirect prompt injection attacks (implied) vs. proposed confusion attacks: bypassed by real-world attack variants — exact bypass rate not quantified in available text.
  • WASP [prior work]: attack taxonomy covers only same-site interactions, no cross-site or system-level attacks vs. WAAA taxonomy: 20 attacks including cross-site (XS-1 through XS-6), system-level (I-1 through I-5), and slop-squatting (SQ-1, SQ-2).
  • DoomArena [prior work]: uses single indirect prompt-injection attack class (hidden text, delimiter tokens, instruction sequences) vs. WAAA: 18 PoC attacks across 5 failure modes, 14 generalized across 4 LLM vendor models.
  • Anthropic/OpenAI/Alibaba out-of-the-box model alignment: insufficient to block 14/14 attacks tested in generalizability study — specific per-model pass/fail breakdown not provided in truncated text.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.05509.

Fig 1

Fig 1: A demonstration of indirect prompt injection vs traditional web attacks

Fig 2

Fig 2: Agentic browsers under the See→Act model

Fig 3

Fig 3: Agentic browsers in our threat model within the

Fig 4

Fig 4 (page 4).

Limitations

  • Attack success is evaluated as a binary qualitative outcome (did the attack work?) with no quantitative success rates, confidence intervals, or multiple-trial statistics reported — a single successful PoC demonstration may not reflect reliable exploit probability in practice.
  • The generalizability study tests 14 of 20 attacks, leaving 6 attacks (including the two unimplemented ones) without cross-model validation; the criteria for selecting which 14 to test are not fully explained in the available text.
  • Specific LLM model versions (e.g., Claude 3.5 Sonnet vs. Claude 3 Opus), API configurations, and system prompts used in the generalizability study are not pinned in the excerpt, limiting exact reproducibility.
  • BrowserOS is a commercial product whose internal architecture is not fully disclosed; results against it may not generalize to other commercial agentic browsers with different capability sets or sandboxing approaches.
  • The paper explicitly scopes out network (MITM) attackers and assumes no training-data poisoning, browser memory-safety bugs, or malicious users — these exclusions are reasonable but leave open the question of combined attack chains involving these vectors.
  • No longitudinal or deployment-scale evaluation: all attacks are tested in controlled lab settings against specific system configurations; real-world exploit rates, detection evasion durability, and attacker cost are not assessed.
  • Two attacks (I-1 Permission abuse, I-2 Drive-by extension install) have no PoC and no generalizability data, yet appear in the taxonomy — their practical exploitability remains unvalidated.

Open questions / follow-ons

  • Can origin-aware capability sandboxing (e.g., an LLM that tracks which page elements are trusted vs. untrusted based on origin and content provenance) practically defend against confusion attacks without unacceptably degrading task completion rates on legitimate workflows?
  • What is the interaction between agentic browser attacks and multi-step task planning in reasoning models (e.g., o1/o3-class)? Do extended reasoning chains increase or decrease susceptibility to confusion attacks by allowing the model more 'time' to detect inconsistencies?
  • How do confusion attacks scale against agentic browsers with memory or persistent state (e.g., agents that maintain session context across multiple browsing sessions)? Can an attacker prime the agent's context in one session to influence behavior in a later, higher-privilege session?
  • The paper notes that removing capability tiers eliminates attack classes — but what is the usability cost of capability restriction? A formal usability-security tradeoff analysis comparing capability-constrained vs. full-capability agentic browsers on standard benchmarks (WebArena, Mind2Web) would be needed to make principled architectural recommendations.

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this paper reframes a longstanding assumption: that the entity solving CAPTCHAs or interacting with protected web surfaces is either a human or a traditional scripted bot. Agentic browsers introduce a third class — an LLM-driven agent that perceives pages visually or via DOM, reasons about context, and acts with human-like flexibility. The paper's finding that confusion attacks (UI mimicry, social engineering patterns, contextual misdirection) are more effective than explicit prompt injection against these agents is directly relevant: CAPTCHA and anti-bot systems that rely on behavioral signals derived from human interaction patterns (mouse movement, timing, click paths) may need to be re-evaluated against agents that can be steered by page content itself. An agentic browser completing a task on a protected page may be vulnerable to the same deceptive UI patterns that bot-defense systems use to catch scripted bots — a potential inversion of the cat-and-mouse dynamic.

More concretely, the taxonomy's cross-site and same-site data exfiltration attacks (XS-1 through XS-5, SS-1 through SS-5) describe attack patterns where an agent is induced to relay sensitive tokens or session data to an attacker — patterns that are structurally similar to how credential-stuffing and session-hijacking bots operate, but now achievable by a markup-only attacker without JavaScript. Bot-defense systems that instrument JavaScript to detect headless browsers or automation frameworks may not detect markup-only confusion attacks at all. The paper's conclusion that existing alignment is insufficient and that rearchitecting is needed suggests that bot-defense infrastructure will need to treat agentic browsers as a distinct principal class requiring new detection signals — potentially including origin-consistency checks, capability-tier monitoring, and behavioral anomaly detection that accounts for LLM-specific action patterns rather than purely scripted-bot heuristics.

Cite

bibtex
@article{arxiv2605_05509,
  title={ WAAA! Web Adversaries Against Agentic Browsers },
  author={ Sohom Datta and Alex Nahapetyan and William Enck and Alexandros Kapravelos },
  journal={arXiv preprint arXiv:2605.05509},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.05509}
}

Read the full paper

Articles are CC BY 4.0 — feel free to quote with attribution