FP-Agent: Fingerprinting AI Browsing Agents

Source: arXiv:2605.01247 · Published 2026-05-02 · By Ethan Wang, Zubair Shafiq, Yash Vekaria

TL;DR

This paper tackles the emerging problem of detecting AI browsing agents—autonomous bots that use real browsers to perform human-like web tasks. Unlike traditional bots, browsing agents operate in full browsers and engage in diverse activities, complicating detection. The authors conduct the first controlled measurement study comparing seven popular AI browsing agents against humans on an instrumented honey website across three tasks (flight booking, shopping, forum interaction). They extract browser fingerprints and behavioral fingerprints capturing typing, scrolling, and mouse activity, then train FP-Agent, a multi-class classifier to distinguish agents from humans and between agents.

Key findings show that browser fingerprints alone offer limited discriminative power due to fingerprint sharing and environmental overlap; behavioral fingerprints are much more distinctive. Typing styles (e.g., paste vs keystroke), mouse movement patterns, and scrolling behavior provide strong signals to reliably fingerprint each browsing agent and differentiate it from humans. FP-Agent achieves near-perfect multi-class classification, detecting all seven agents, whereas Cloudflare's bot detection only detects one. The study exposes significant blind spots in current deployed defenses and highlights behavioral fingerprinting as critical for detecting AI browsing agents today.

Key findings

Browser fingerprint classifier achieves ~0.80 F1 score on 7 agents indicating limited discriminative power.
Behavioral fingerprint classifier achieves ~0.999 F1 score, showing near-perfect multi-class discrimination.
Combination of browser + behavioral features attains perfect 1.0 F1 score across agents and humans.
Three agents (Atlas Agent, Browser Use, Claude) share identical browser fingerprints on macOS causing confusion in classification.
Key behavioral distinctions include paste-based vs. keystroke-based typing, repeated delete-and-retype, and mouse movement styles such as direct jumps or multi-burst scrolling.
FP-Agent detects all 7 browsing agents in a case study on Cloudflare traffic, while Cloudflare’s native detection flags only 1 agent.
Behavioral fingerprints remain stable and distinctive across three representative web tasks — flight booking, online shopping, and forum interaction.
Browsing agents show minimal browser fingerprint variation, correlating strongly with fixed remote or local execution environments.

Threat model

Adversaries are AI browsing agents that autonomously navigate websites using real browser instances, aiming to mimic human behavior. They do not necessarily self-identify or cooperate with detection systems. The defender seeks to distinguish these browsing agents from human users and from one another based solely on observable browser and behavioral fingerprint features collected during typical web tasks. The adversary cannot easily vary underlying system attributes or convincingly replicate human behavioral characteristics without degrading their goal performance.

Methodology — deep read

The authors begin by defining their threat model as website publishers seeking to detect AI browsing agents from humans and one another based on browser and behavioral fingerprints. The adversary is the AI browsing agent operating real browsers possibly on known or unknown underlying systems, with no assumption of self-identification.

Data is collected from 7 browsing agents — including OpenAI Atlas, ChatGPT Agent, Anthropic Claude for Chrome, Perplexity Comet, Meta Manus, and two open-source agents Browser Use and Skyvern. Agents run locally on Windows, macOS, Linux, or remotely in the cloud. 1000 trials per agent were executed, equally split across three tasks (flight booking, shopping, forums), each trial on a unique agent- and task-specific subpage of an instrumented honey website. Additionally, 56 human participants completed 3 repetitions of each task on their own systems, yielding 546 human trials.

The honey website collects browser fingerprints (using FingerprintJS) capturing system attributes like screen resolution, fonts, CPU cores, plugins, etc. Behavioral fingerprinting utilizes custom JavaScript listeners recording typing events (key hold and inter-key latencies, paste events, delete usage), mouse movements (velocity, direction, movement style), and scrolling behavior.

Features were featurized into 418 browser fingerprint and 50 behavioral fingerprint features. Categorical browser attributes were one-hot encoded; numeric features kept in floating point. Behavioral features missing for some tasks were represented with sentinel value -1.

FP-Agent classifier is a multi-class XGBoost model trained to predict among the 7 agents and humans. Models trained separately on browser features only, behavioral features only, and combined features, evaluated on an 80/20 train/test split. Class imbalance (fewer humans) was addressed by XGBoost’s native weighting. Feature importance was analyzed using XGBoost gain metrics and SHAP values for interpretability.

Statistical significance of feature differences between classes was assessed using Mann-Whitney U tests and Brown-Forsythe test for variance. Effect sizes with rank-biserial correlation r and p<0.01 significance threshold were reported.

Reproducibility is supported through published code and data artifacts on GitHub. System details and agent versions were fixed for experiments. A concrete example: in flight booking, agents and humans filled multi-step forms; behavioral features revealed agents often used paste or exhibited direct mouse jumps unlike humans.

Evaluation includes both per-class precision/recall/F1 metrics and confusion matrices. Results show browser fingerprints alone have substantial class overlap due to identical fingerprints shared by multiple agents running on the same OS environment. Behavioral fingerprints drive near-perfect agent-vs-human and inter-agent classification. Authors also conduct a case study evaluating detection efficacy on Cloudflare’s deployed bot detection, highlighting the advantage of behavioral features.

Technical innovations

First controlled measurement study characterizing browser and behavioral fingerprints of 7 AI browsing agents performing realistic web tasks.
Development of FP-Agent, a multi-class XGBoost classifier combining browser and behavioral features to distinguish AI browsing agents and humans with near-perfect accuracy.
Novel behavioral fingerprinting involving fine-grained typing, scrolling, and mouse movement features to capture distinctive agent interaction patterns.
Demonstration that browser fingerprints alone offer limited discrimination owing to shared fingerprints and environment correlation, emphasizing the importance of behavioral signals.

Datasets

FP-Agent controlled dataset — 7 browsing agents and 56 humans — instrumented honey website over three web interaction tasks — publicly released at https://github.com/ethanbwang/fp-agent

Baselines vs proposed

Browser fingerprint classifier (agents only): F1 = 0.797 vs Behavioral fingerprint classifier: F1 = 0.999
Browser fingerprint + behavioral classifier (agents + humans): F1 = 1.0
Cloudflare bot detection (case study): detects 1 of 7 browsing agents vs FP-Agent detects all 7

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.01247.

Fig 1

Fig 1: An overview of our FP-Agent framework.

Fig 2

Fig 2 (page 4).

Fig 3

Fig 3 (page 4).

Fig 4

Fig 4 (page 4).

Fig 5

Fig 5 (page 4).

Fig 6

Fig 6 (page 4).

Fig 7

Fig 7 (page 4).

Fig 2

Fig 2: Time series of events representing each brows-

Limitations

Study covers only 7 selected browsing agents and 56 human participants from a single university environment, limiting population diversity.
Tasks focus on three specific web activities and may not capture full browsing behavior variability.
Behavioral fingerprint features and agent interaction patterns may evolve as browsing agents improve mimicry of humans.
Browser fingerprint overlap observed may differ in real-world environments with more diverse OS and devices.
No adversarial adaptation or long-term testing against agent obfuscation strategies done.
Cloudflare case study limited and does not generalize to all deployed defenses.

Open questions / follow-ons

How will future browsing agents adapt behavioral patterns to evade detection based on typing, scrolling, and mouse features?
Can these fingerprinting methods be generalized to a wider variety of web tasks and broader user populations?
What are the tradeoffs between detection accuracy and user privacy when collecting fine-grained behavioral biometrics?
How can emerging bot authentication standards like Web Bot Auth complement behavioral fingerprinting in robust AI bot control?

Why it matters for bot defense

This paper provides bot-defense practitioners with an empirical foundation to better detect AI browsing agents, an emerging threat that uses real browsers and exhibits sophisticated behaviors making traditional detection ineffective. The results highlight that relying solely on browser fingerprinting or standard client signals is insufficient. Behavioral fingerprinting leveraging keystroke dynamics, mouse trajectories, and scrolling patterns is crucial for distinguishing advanced AI agents from humans and each other. For CAPTCHA and bot mitigation systems, integrating these behavioral signals can enhance detection and classification accuracy, enabling short-window real-time identification of automated agents.

While existing deployed solutions like Cloudflare’s bot detection currently miss most browsing agents, frameworks like FP-Agent demonstrate that multi-class behavioral classification can identify all tested agents. Practitioners designing CAPTCHA and behavioral challenges should focus on capturing nuanced user interaction patterns rather than only static client attributes. The measurement framework and open dataset provided enable ongoing research to keep pace with evolving AI bots. However, defenses must also prepare for future bots that might better mimic human behaviors or adopt more complex obfuscation.

Cite

bibtex

@article{arxiv2605_01247,
  title={ FP-Agent: Fingerprinting AI Browsing Agents },
  author={ Ethan Wang and Zubair Shafiq and Yash Vekaria },
  journal={arXiv preprint arXiv:2605.01247},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.01247}
}

FP-Agent: Fingerprinting AI Browsing Agents ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​