FP-Inconsistent: Measurement and Analysis of Fingerprint Inconsistencies in Evasive Bot Traffic

Source: arXiv:2406.07647 · Published 2024-06-11 · By Hari Venugopalan, Shaoor Munir, Shuaib Ahmed, Tangbaihe Wang, Samuel T. King, Zubair Shafiq

TL;DR

This paper studies a practical but under-measured problem: bots that actively change browser fingerprints to evade commercial anti-bot systems. The authors argue that prior work either used self-built bots or relied on vendor labels, so it missed the behavior of “evasive bots” sold in underground traffic marketplaces. Their contribution is a honey-site setup that isolates traffic from 20 purchased bot services, then measures how often that traffic evades two commercial defenses, DataDome and BotD, while also collecting browser fingerprint attributes.

The main result is that evasive bots are not only common, they are also sloppy. Across 507,080 requests over three months, the bot services evade DataDome 52.93% of the time on average and BotD 44.56% of the time on average. More importantly, the fingerprint values used for evasion are often internally inconsistent: bots claim device types, screen sizes, plugins, and geographies that do not line up with each other or with the IP/location metadata. The authors turn those inconsistencies into FP-Inconsistent, a data-driven rule-mining approach that reduces evasion by 48.11% against DataDome and 44.95% against BotD while keeping true negative rate at 96.84% on real-user traffic.

Key findings

Across 507,080 requests from 20 bot services, DataDome detected 55.44% and BotD detected 47.07%; equivalently, average evasion was 52.93% against DataDome and 44.56% against BotD.
82.54% of requests originated from IPs in ASN blocklists, yet evasive bots still achieved 52.93% evasion against DataDome and 43.17% evasion against BotD on those flagged ASNs.
MaxMind IP blocklists had only 15.86% coverage in their sample; within that covered subset, 48.1% of requests evaded DataDome and 68.85% evaded BotD.
A random-forest classifier trained to separate detected vs evaded requests achieved 82.09% train / 81.66% test accuracy for DataDome and 97.8% train / 97.71% test accuracy for BotD, suggesting strong fingerprint separation between detected and evasive traffic.
Top SHAP features for evading DataDome were Vendor Flavors, Plugins, Screen Frame, Hardware Concurrency, and Forced Colors; for BotD they were Vendor Flavors, Plugins, Touch Support, Vendor, and Contrast.
For BotD, 99.91% of requests from the highest-evasion bot services supported the Chrome PDF Viewer plugin, while 100% of requests from the lowest-evasion bot services supported no plugins; presence of any PDF plugin nearly guaranteed evasion in Fig. 4.
For DataDome, 84.7% of requests from high-evasion bot services had fewer than 8 CPU cores versus 38.16% for low-evasion services; after filtering certain always-detect/always-evasive cases, the gap was 84.7% vs 19.05%.
FP-Inconsistent reduced evasion by 48.11% against DataDome and 44.95% against BotD while maintaining a true negative rate of 96.84% on real-user traffic.

Threat model

The adversary is a bot operator selling or deploying impression-fraud traffic to publishers who use anti-bot services such as DataDome or BotD. The adversary can alter browser fingerprint values, choose source IPs and ASNs, and emulate some device properties to look human, but the attacker cannot perfectly maintain cross-attribute consistency across all browser-exposed signals and network/location metadata. The defender is the publisher or anti-bot service trying to detect evasive bots without relying solely on behavioral actions, since impression-fraud bots can simply load pages and record views.

Methodology — deep read

Threat model and setup: the paper targets impression-fraud bots sold by blackhat traffic vendors, not credential-stuffing or account-takeover bots. The adversary is a bot operator trying to look like a legitimate human visitor in order to evade commercial anti-bot services embedded on a publisher’s site. The authors assume the attacker can alter browser-exposed fingerprint values and can choose source IPs/ASNs, but cannot magically make all fingerprint attributes self-consistent if they are being spoofed. The goal is to understand what evasive bots actually do in the wild and then exploit the inconsistencies they leave behind.

Data collection and provenance: the authors deploy a honey site and create multiple versions of the same site under one domain, each version distinguished by a random URL string (e.g., /XXXXX, /YYYYY). They buy traffic from 20 bot services on SEOClerks and only keep requests containing the matching URL string so they can attribute each request to the purchased service and avoid contamination from real users or unrelated bots. The site runs two commercial anti-bot services, DataDome and BotD, and the authors log their decisions per request. They also instrument the page with FingerprintJS to collect over 30 fingerprint attributes (examples mentioned include plugins, screen/frame data, hardware concurrency, touch support, vendor/flavors, forced colors, and contrast). The dataset spans three months (September–November 2023) and contains 507,080 requests. Table 1 reports per-service request counts ranging from 382 to 121,500, with highly variable evasion rates across services and across the two anti-bot products.

Architecture / algorithmic pipeline: the measurement system has three parts. First, the honey-site isolation mechanism uses unique URL strings to attribute requests to a purchased bot service. Second, each request is evaluated by both DataDome and BotD, whose JavaScript libraries send browser fingerprints to their servers and return a bot/human decision; the authors also crawl the site with OpenWPM to infer which browser APIs the vendors access. Third, the authors build analyzers over the collected fingerprints. In Section 5, they train two random-forest models with XGBoost implementation details cited in text to predict whether a request would be detected or evade each anti-bot service. They then use SHAP to rank feature importance, and compare fingerprints among high-evasion vs low-evasion bot services to identify candidate evasion attributes. In Section 6, they inspect cross-attribute and attribute-vs-IP inconsistencies, e.g., a request claiming to be from an iPhone while reporting screen resolutions that do not exist for real iPhones.

Training regime and concrete example: the paper says the detection-vs-evasion classifiers use a 90/10 train/test split; it does not report epochs, batch size, learning rate, random seeds, or hardware for the ML experiments, because the classifiers are tree-based rather than neural. One concrete end-to-end example is the iPhone inconsistency analysis. The authors first observe that iPhone-labeled requests have the highest probability of evading DataDome (around 50%, per Fig. 6). They then check whether those requests are internally plausible by examining screen resolutions. Real iPhones have a fixed set of 12 resolutions, but the observed iPhone-labeled traffic contains 83 unique screen resolutions, and 9 of the top 10 screen resolutions associated with evasion do not exist in the real world. That mismatch is used as evidence that the bot is spoofing User-Agent/device type without synchronizing other fingerprint fields. Similar logic is applied to geographic attributes: requests are compared against GeoLite2 IP geolocation and timezone-derived location, using a conservative matching rule based on UTC offsets.

Evaluation protocol and reproducibility: the main metrics are evasion rate against each anti-bot service, classifier accuracy for separating detected vs evaded requests, and true negative rate on real-user traffic for the resulting inconsistency rules. The authors compare high-evasion and low-evasion bot groups, and inspect attribute distributions such as plugins, hardwareConcurrency, touch support, and screen resolution. They also evaluate the FP-Inconsistent rules against real-user traffic and note that the rules do not incur false positives with most privacy-enhancing technologies. The paper states that it open-sources the honey-site architecture and inconsistency rules, but the excerpt does not provide a dataset release or frozen full raw traffic corpus. Some implementation details are left implicit, especially for the rule-generation step: the core idea is to find attribute pairs whose number of observed configurations is larger than expected for real devices, but the exact thresholding and selection procedure are not fully spelled out in the provided text.

Technical innovations

A honey-site design that isolates bot-service-specific traffic by using per-service random URL strings under the same domain, giving much stronger ground truth than generic honeypot discovery.
A large-scale empirical measurement of evasive bot traffic in the wild against two real commercial anti-bot services, rather than self-built bots or vendor-labeled traffic.
A data-driven inconsistency-mining method that looks for over-diverse configuration pairs across two attributes or across time, instead of relying on hand-written anecdotal rules.
A practical rule set derived from those inconsistencies that can be deployed by anti-bot services and measurably lowers evasion without collapsing true negatives on real users.

Datasets

Honey-site bot traffic — 507,080 requests from 20 bot services — collected September-November 2023 from SEOClerks-purchased traffic on the authors' honey site
Real-user validation traffic — size not specified in excerpt — source not specified in excerpt
OpenWPM crawler traces of anti-bot script behavior — size not specified in excerpt — collected by crawling the honey site

Baselines vs proposed

DataDome: detection rate = 55.44% vs proposed (evasion rate reduction with FP-Inconsistent) = 48.11% reduction in evasion
BotD: detection rate = 47.07% vs proposed (evasion rate reduction with FP-Inconsistent) = 44.95% reduction in evasion
DataDome classifier baseline: accuracy = 81.66% test vs proposed (same classifier used for feature discovery) = 81.66% test
BotD classifier baseline: accuracy = 97.71% test vs proposed (same classifier used for feature discovery) = 97.71% test

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2406.07647.

Fig 1

Fig 1: To collect requests from different bot ser-

Fig 2

Fig 2: Screenshot from a bot service on SEOClerks

Fig 3

Fig 3: Overview of our data collection pipeline.

Fig 4

Fig 4: Bar plot showing the probability of PDF plug-

Fig 5

Fig 5: compares cumulative probability distribution

Fig 6

Fig 6 (page 4).

Fig 7

Fig 7 (page 4).

Fig 11

Fig 11: An example of excessive configurations of

Limitations

The study focuses on one fraud class, impression fraud; conclusions may not transfer to credential stuffing, account takeover, or other bot goals with different behavioral constraints.
The bot sample comes from 20 purchased services on SEOClerks, which is valuable but still a convenience sample from one marketplace and one time window.
The anti-bot services are black boxes, so feature attribution is inferred indirectly through crawling, SHAP, and behavioral correlations rather than vendor-confirmed feature lists.
The rule-generation details for FP-Inconsistent are only partially described in the excerpt; thresholding, sensitivity analysis, and robustness to dataset drift are not fully visible here.
Reported performance is largely on traffic collected from the authors’ honey site; generalization to different sites, page structures, or future bot adaptations is not established in the excerpt.
The authors note limited coverage from IP blocklists, but the interaction between fingerprint inconsistency rules and stronger browser spoofing or privacy tools is only partially evaluated.

Open questions / follow-ons

How stable are the discovered inconsistency rules under adaptive attackers who deliberately optimize for cross-attribute consistency rather than just single-attribute spoofing?
Would the same rule-mining approach work on other fingerprint sources beyond FingerprintJS attributes, such as WebGPU, audio, or higher-resolution canvas signals?
How much performance degrades when the bot population changes over time, especially if services rotate device profiles or use real-device residential proxies?
Can inconsistency rules be composed with behavioral signals to reduce false positives while retaining the gains on evasive traffic?

Why it matters for bot defense

For CAPTCHA and bot-defense practitioners, the practical lesson is that evasive traffic often leaks contradictions even when it passes a single commercial detector. That means defenders should not treat fingerprint attributes as isolated scores; they should model the joint feasibility of the entire fingerprint bundle. In practice, this paper suggests adding consistency checks between device class, screen geometry, plugin stack, touch capability, CPU/memory, and network geography, then using those checks as a cheap pre-filter or a corroborating signal before invoking heavier challenges.

For a bot-defense engineer, the result is especially relevant because the study targets traffic that is already commercially purchased as “realistic” and “undetectable.” Those services are exactly the kind of adversary that tends to show up in CAPTCHA bypass and ad-fraud pipelines. The key operational takeaway is that a bot can be individually plausible on each attribute and still be globally implausible across attributes. That makes inconsistency mining a useful complement to challenge-based systems, particularly when you want low-friction detection for impression fraud where explicit user actions are sparse or absent.

Cite

bibtex

@article{arxiv2406_07647,
  title={ FP-Inconsistent: Measurement and Analysis of Fingerprint Inconsistencies in Evasive Bot Traffic },
  author={ Hari Venugopalan and Shaoor Munir and Shuaib Ahmed and Tangbaihe Wang and Samuel T. King and Zubair Shafiq },
  journal={arXiv preprint arXiv:2406.07647},
  year={ 2024 },
  url={https://arxiv.org/abs/2406.07647}
}

FP-Inconsistent: Measurement and Analysis of Fingerprint Inconsistencies in Evasive Bot Traffic ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​