The First Early Evidence of the Use of Browser Fingerprinting for Online Tracking

Source: arXiv:2409.15656 · Published 2024-09-24 · By Zengrui Liu, Jimmy Dani, Yinzhi Cao, Shujiang Wu, Nitesh Saxena

TL;DR

This paper addresses a major gap in browser fingerprinting research by providing the first empirical evidence that browser fingerprinting is actively used for online user tracking and targeted advertising—not merely for defensive uses like fraud detection. Prior studies only measured the presence of fingerprinting scripts but could not establish their privacy-invasive tracking role. The authors introduce FPTrace, a comprehensive framework combining browser fingerprint spoofing, user-interest persona simulation, and client-side header bidding data collection to measure how changes in browser fingerprints alter advertising bids and HTTP tracking activity. Their large-scale study reveals significant bid value fluctuations and reductions in HTTP tracking chains upon fingerprint spoofing, strong indicators that advertisers leverage fingerprinting for user identification and interest profiling. Notably, fingerprinting enables bypassing GDPR/CCPA consent opt-outs, raising substantial privacy concerns. The work exposes browser fingerprinting as a widespread, covert tool for targeted ads and cookie restoration in modern advertising ecosystems.

Key findings

Bid value median and maximum show significant differences when switching from true to fake browser fingerprints while holding cookies and IP constant, indicating fingerprinting influences ad targeting bids.
The number of HTTP tracking records, including HTTP chains and syncing events, decreases sharply after modifying browser fingerprints, demonstrating fingerprinting affects tracking activity.
378 cases of cookie restoration linked to fingerprinting were identified across 90 unique cookie key-host pairs, but direct causal linkage to fingerprinting for restoration was inconclusive after manual inspection.
Fingerprinting-based tracking persists despite GDPR and CCPA user data opt-outs, with Onetrust, Quantcast, and NAI CMP platforms showing fingerprint-driven data sharing activity under regulatory settings.
FPTrace successfully uses spoofing extensions (Gummy Browser techniques) to alter JavaScript fingerprint APIs and HTTP headers to control fingerprint variance in experiments.
Replicate experiments confirm stable bidding behavior over repeated runs with true fingerprints, ensuring observed differences are due to fingerprint changes rather than noise or randomness.
Using header bidding data collected via Prebid.js APIs as a proxy for advertiser interest reveals fingerprint changes modulate bidder enthusiasm quantitatively.
Fingerprint spoofing changes include spoofed Navigator, Screen, Canvas, and Date APIs, plus HTTP header fields to evade simple cross-checking by websites.

Threat model

The adversary is an online advertising ecosystem participant (publishers, bidders, data brokers) aiming to track and profile users via browser fingerprinting to enhance ad targeting and bypass privacy controls. The adversary can collect fingerprints, cookie data, and header bidding info, but cannot modify user fingerprint, IP, or browsing behaviors. The adversary’s capabilities include executing JavaScript fingerprinting scripts and accessing Prebid.js bidding data. They cannot circumvent experimental fingerprint spoofing or cookie clearing within the FPTrace-controlled browser context.

Methodology — deep read

Threat model & assumptions: The adversary is an online advertiser or tracking entity attempting to identify or track users across websites using browser fingerprinting techniques. The adversary can observe browser fingerprint data, cookies, and header bidding information but cannot manipulate user fingerprints or IP addresses directly. The goal is to detect whether fingerprints are used for tracking beyond defensive uses.
Data: The authors compile interest-specific website lists (e.g., computer-related) by Google search and manual filtering, plus Alexa top-10k for general bid-collection. Browser profiles are built by sequentially visiting websites to simulate user interest personas, collecting cookie and fingerprint data. Multiple profiles are created – baseline with true fingerprints, and spoofed with fake fingerprints using Gummy Browser methods. Experiments run mainly on US IPs. Time and exact profile sizes are not explicitly stated. Cookies and HTTP data from OpenWPM Selenium-driven Firefox crawls are exported after visits.
Architecture / algorithm: FPTrace extends OpenWPM by integrating spoofing extensions that overwrite JavaScript fingerprint APIs (Navigator, Screen, Canvas, Date) and HTTP header modification (via ModHeader). It automates sequential site visits simulating typical user interactions (mouse movements, scrolling, randomized delays) and captures prebid.js header bidding auction data via injected code, recording all bidders’ bid values and ad content. Cookies are exported and compared between experiments to examine restoration. FPTrace cross-compares bidding and HTTP chain data between true vs spoofed fingerprints while keeping cookies and IP addresses fixed.
Training regime: Not applicable since this is an empirical measurement framework rather than ML training. Experiments run repeatedly for stability checks. Specific hardware/software setups not detailed beyond usage of OpenWPM with Selenium controlling Firefox browser with extensions.
Evaluation protocol: Metrics include bid value distributions (mean, median, max), counts of HTTP tracking records and chains, number of detected cookie restorations (matched keys and values), and presence/absence of specific CMP platforms interacting with fingerprint-based data sharing. Controls include repeating true fingerprint experiments twice confirming stable bids. Different experiment settings combine enabling/disabling cookies, using true or fake fingerprints, and varying IP addresses as summarized in Table 1. Statistical tests are not mentioned explicitly. They perform manual inspection of cookies to evaluate linkage to fingerprinting.
Reproducibility: Raw data (including bids, cookies, HTTP captures) are hosted on a public OneDrive repository. OpenWPM is an open-source platform. The paper describes internal FPTrace extensions but does not state if source code or extensions for fingerprint spoofing and bid interception are publicly released. Some datasets (GitHub-released browsing fingerprints) are used for spoofing. Overall partial reproducibility with publicly available base tools but framework code is not confirmed.

Concrete example: The key measurement cycle is to build a browser profile by visiting computer-topic websites with normal fingerprinting (Step 1), then visit header-bidding-enabled sites in list W_bids using this profile to capture bid values and HTTP data (Step 2/3). The same is repeated with fingerprint spoofed profiles and cookies removed. The differences in bidding patterns and HTTP records quantify fingerprinting's role in ad targeting and tracking. Additionally, they erase cookies to check for cookie restoration linked to fingerprinting by comparing cookies sets obtained before/after clearing cookies across fingerprint settings.

Overall, the methodology carefully isolates browser fingerprint as the variable and leverages header bidding data as a novel, client-visible indicator of ad tracking relevance influenced by fingerprinting, combined with cookie restoration checks under privacy regulation consideration.

Technical innovations

FPTrace framework combining automated user interaction emulation, browser fingerprint spoofing, and client-side header bidding data capture to measure fingerprinting-based tracking.
Use of client-side Prebid.js auction data to quantify advertiser interest shifts due to fingerprint changes, a novel indirect signal of tracking.
Integration of Gummy Browser spoofing techniques into a browser extension that modifies all JavaScript fingerprinting APIs plus HTTP headers in OpenWPM controlled Firefox environment.
Cookie restoration detection protocol comparing cookie sets after clearing across fingerprint spoofing settings, isolating fingerprint contribution to cookie re-instantiation.
Empirical demonstration that fingerprinting enables bypassing of GDPR/CCPA consent opt-out mechanisms via data sharing platforms identified by CMPs like Onetrust, Quantcast, and NAI.

Datasets

Browser fingerprint dataset (GitHub) — size unspecified — public dataset used for JavaScript API spoofing
Alexa top 10,000 websites — used to identify Prebid.js header bidding sites — public
Computer-topic website list (curated) — ~40 sites per persona — constructed via Google search, manual filtering
Raw captured bid values, cookies, HTTP tracking records — OneDrive repository of authors (public)

Baselines vs proposed

True FP True IP have cookies (baseline replicated twice): median bid ~ stable, max bid consistent.
False FP True IP have cookies: median and max bid significantly lower than true FP baseline (exact numbers in Table 2, not fully detailed here).
True FP True IP no cookies vs False FP True IP no cookies: similar directional drop in bids on fingerprint change.
Number of HTTP tracking records reduced significantly after fingerprint spoofing in both cookie-enabled and cookie-cleared settings.
Cookie restoration observed in 378 instances across 90 cookie key-host combos; manual inspection inconclusive on direct fingerprint causality.
Under GDPR/CCPA settings, fingerprinting-dependent data sharing present with CMP platforms, bid and HTTP tracking variations remain.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2409.15656.

Fig 1

Fig 1: High level overview of measurement study methodology.

Fig 2

Fig 2: High level overview of advertisement experiment. In step 1,

Fig 3

Fig 3: High level overview of cookie restoration experiment. In

Fig 4

Fig 4: Figure a is the CDF of different fingerprints and IPs settings

Fig 5

Fig 5 (page 4).

Fig 6

Fig 6 (page 4).

Fig 7

Fig 7 (page 4).

Fig 8

Fig 8 (page 4).

Limitations

No explicit adversarial evaluation of sophisticated fingerprint spoofing or detection evasion by trackers.
Methodology focuses on fingerprinting effects in US IP environment; limited geographic diversity may affect generalizability.
Cookie restoration linkage to fingerprinting remains correlational; manual inspections were inconclusive for direct causal proofs.
FPTrace code and fingerprint spoofing extensions are not confirmed as publicly released, potentially limiting reproducibility.
Dataset size for persona simulation is relatively small (~40 sites per persona topic), which may limit scope of interest profiles.
Experiments do not evaluate long-term tracking persistence beyond single-session visits or cross-device linkage.

Open questions / follow-ons

How effective are advanced fingerprint spoofing and evasion techniques against fingerprint-based tracking strategies in the wild?
Can the fingerprinting-based tracking be linked more conclusively with specific cookie restoration mechanisms or supercookie techniques?
What is the longitudinal impact on user privacy when fingerprinting is combined with other tracking signals across sessions and devices?
How do fingerprinting-based tracking and ad targeting vary across geographic regions with differing privacy laws and enforcement?

Why it matters for bot defense

This study provides the first strong empirical evidence that browser fingerprinting is actively leveraged for user tracking and targeted advertising, beyond defensive uses like fraud or bot detection. Bot defense engineers and CAPTCHA practitioners should recognize that trackers exploit fingerprinting changes to modulate ad bids and restore cookies, which impacts user identification robustness. As fingerprinting can bypass GDPR and CCPA opt-outs, bot defense strategies must include detecting fingerprint spoofing or variability combined with fingerprint-based tracking signals. FPTrace’s approach to quantifying tracking via client-visible header bidding data could inspire new measurement tools and detection heuristics in bot mitigation. Additionally, understanding the role of fingerprinting in cookie restoration may aid CAPTCHA systems in distinguishing legitimate users from trackers or fingerprint-based cloakers, enhancing security while respecting privacy boundaries.

Cite

bibtex

@article{arxiv2409_15656,
  title={ The First Early Evidence of the Use of Browser Fingerprinting for Online Tracking },
  author={ Zengrui Liu and Jimmy Dani and Yinzhi Cao and Shujiang Wu and Nitesh Saxena },
  journal={arXiv preprint arXiv:2409.15656},
  year={ 2024 },
  url={https://arxiv.org/abs/2409.15656}
}

The First Early Evidence of the Use of Browser Fingerprinting for Online Tracking ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​