Skip to content

How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users

Source: arXiv:2410.06954 · Published 2024-10-09 · By Alex Berke, Enrico Bacis, Badih Ghazi, Pritish Kamath, Ravi Kumar, Robin Lassonde et al.

TL;DR

This study tackles limitations in prior research on browser fingerprinting by collecting and publicly releasing a novel dataset that links detailed browser fingerprint attributes with rich user demographics from 8,400 US participants who provided informed consent. Unlike past datasets, which lacked demographics and had biases due to volunteer recruitment, this data enables new analyses of fingerprint uniqueness and privacy risks across demographic groups. The authors also conduct a randomized experiment with 12,461 participants examining factors influencing willingness to share browser data for open research, revealing that females and participants shown their own browser data were less likely to share.

Using their dataset, the paper quantifies how fingerprinting risk varies significantly with demographics: lower income users have more unique fingerprints, and older users are both more concerned about fingerprinting and at elevated actual risk. Critically, the authors demonstrate that browser fingerprint attributes alone can be used to infer user demographics such as gender, age, income, and race, exposing an overlooked privacy threat. The dataset and collection methodology provide a foundation for future research aiming to improve privacy protections and understand fingerprinting risks in the general US population.

Key findings

  • Approximately 60% of users in the dataset have unique overall browser fingerprints based on 13 stable attributes (Table 1).
  • Lower income groups exhibit higher fingerprint uniqueness and smaller anonymity sets, indicating greater fingerprinting risk compared to higher income users.
  • Older users are both significantly more likely to express concern about browser fingerprinting and have more unique fingerprints, increasing their privacy risk.
  • Female participants were significantly less likely to share browser data (OR = 0.909, p < .05), even controlling for understanding and concern about fingerprinting.
  • Showing participants their exact browser fingerprint data prior to asking for consent reduced overall sharing rates from 69.1% to 65.8% (p < .05).
  • Using fingerprint attributes, demographic categories such as gender, age, income, and race can be inferred with measurable signals, posing an overlooked privacy risk.
  • Survey data showed only 43% of participants agreed they understand how browser fingerprinting works, yet over 70% expressed concern about the practice.
  • Participants recruited via Prolific demonstrated demographic diversity roughly approximating the US 18+ population but underrepresented those 65+, Hispanic origin, and highest income bracket groups.

Threat model

The adversary is a web tracker that remotely collects browser fingerprinting attributes to identify and track users across websites without relying on cookies. The adversary can query device and browser configuration attributes via JavaScript or HTTP headers but does not have direct access to users’ demographics. The study assumes the adversary cannot alter browser behavior or attributes at will but can passively observe browser states.

Methodology — deep read

  1. Threat Model and Assumptions: The adversary is a web tracker who collects browser fingerprinting attributes remotely via client-side JavaScript or HTTP headers to identify and track individuals across sites without relying on cookies. The adversary does not have direct access to demographic data; thus the study assesses whether fingerprint signals leak demographic information and how uniqueness varies between demographic groups.

  2. Data Collection: Data was gathered in December 2023 from 12,461 English-speaking US adults recruited via the Prolific platform, designed to approximate US Census demographics (with some biases). 8,400 participants consented to share detailed browser fingerprinting data. The dataset includes self-reported demographics (age, gender, income, race, ethnicity), survey responses on fingerprinting understanding and concern, and collected browser attributes. The attributes include User Agent, Languages, Timezone, Screen Resolution, Platform, WebGL parameters, and others from the FingerprintJS open library v3 plus additions. Participants were randomized 50/50 to a "showdata" experiment arm, where half saw their extracted browser fingerprint data before consenting.

  3. Architecture / Algorithm: No ML model training described; however, uniqueness metrics like Shannon entropy and anonymity set size were computed on browser attributes to quantify fingerprintability. Logistic regression analyses were performed to measure the impact of demographics, survey perceptions, and experiment assignment on likelihood to share data and expressed concerns.

  4. Training Regime: Not applicable for machine learning models; statistical models trained using dataset of 12,210 participants passing attention checks, with odds ratios computed. Standard survey data quality control applied.

  5. Evaluation Protocol: Uniqueness measured by percentage unique and entropy per attribute; multivariate logistic regression used to isolate factors influencing data sharing and concern. Comparison to US census demographic proportions performed. Experiment tested effect of showing fingerprint data on consent rates. The robustness of fingerprint uniqueness measures was compared with prior studies.

  6. Reproducibility: The authors provide the full dataset and survey tools openly for future research. The data collection was IRB approved with informed consent. Precise date omitted to limit re-identification risk.

Technical innovations

  • Release of a novel, publicly available dataset linking rich browser fingerprinting attributes with user demographics and perceptions for 8,400 US participants.
  • Randomized controlled experiment measuring the effect of showing participants their own browser fingerprint data on consent to share, a factor not previously quantified.
  • Demonstration that demographic attributes (gender, age, income, race) can be inferred directly from browser fingerprinting signals, revealing a new privacy threat.
  • Comprehensive multivariate logistic regression analysis connecting demographics, privacy understanding, concern, and behavior toward data sharing in fingerprinting research.

Datasets

  • Berke et al. dataset — 8,400 browser fingerprint + demographics with survey data — collected Dec 2023 via Prolific, publicly available

Baselines vs proposed

  • Panopticlick (2010): 83.6% to 94.2% fingerprint uniqueness vs Berke et al.: ~60% uniqueness (likely due to attribute set differences and population)
  • AmIUnique (2016): 81–90% uniqueness desktop/mobile vs Berke et al.: 60%
  • Gómez-Boix et al. (2018): 33.6% uniqueness on French news site vs Berke et al.: 60% uniqueness in US national sample
  • Consent rates: Showdata arm 65.8% share vs No-showdata arm 69.1% share (p < 0.05)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2410.06954.

Fig 1

Fig 1: The survey tool is available in our open repository, with

Fig 2

Fig 2: Screenshot from survey displaying experimental

Fig 3

Fig 3: Participants’ responses for statements S1 and S2.

Fig 4

Fig 4: Entropy estimates for varying gender ratios.

Fig 5

Fig 5: (Top) % unique and (bottom) average anonymity set size by demographic group.

Fig 6

Fig 6: Mutual information analysis results.

Fig 9

Fig 9: Survey duration times comparing participants who

Fig 10

Fig 10: Changes in Shannon entropy estimates as

Limitations

  • Dataset underrepresents older adults (65+) and high-income households (> $150K), limiting conclusions for those groups.
  • FingerprintJS library v3 version used may miss attributes collected by newer or proprietary fingerprinting scripts.
  • No adversarial evaluation or simulation of attackers attempting to circumvent fingerprinting or infer demographics.
  • Self-reported demographics and consent behavior may suffer from selection bias inherent to crowdworker samples.
  • The stability and uniqueness of fingerprint attributes over time or across sessions were not analyzed here.
  • The statistical models explain limited variance (pseudo R-squared < 0.06), suggesting other unmeasured factors affect perceptions and sharing.

Open questions / follow-ons

  • How generalizable are these findings to global populations beyond the US, given cultural and technological differences?
  • What is the stability over time of the demographic inference signals from fingerprints—can demographics be reliably tracked cross-session?
  • Can privacy-enhancing browser modifications effectively remove or obfuscate demographic signals in fingerprints without breaking usability?
  • What adversarial countermeasures can users or browsers deploy to reduce demographic inference risk while maintaining fingerprinting resistance?

Why it matters for bot defense

This work highlights that browser fingerprinting risks are not uniform across demographic groups; users with lower income and older adults tend to have more unique fingerprints, suggesting fingerprinting defenses should consider demographic factors to avoid exacerbating inequalities. For CAPTCHA and bot-defense systems relying on fingerprinting signals, this demographic leakage raises privacy concerns beyond simply identifying bots—the inference of sensitive demographics might violate user privacy regulations or ethical standards. Practitioners should be aware that fingerprinting attributes can inadvertently reveal user identity and demographics, and thus consider minimizing attribute collection, or integrating demographic risk assessments into threat models. The randomized experiment showing that revealing fingerprint data reduces consent rates also implies that transparency and user education can affect data sharing willingness, which is relevant for ethical data collection in bot detection research.

Cite

bibtex
@article{arxiv2410_06954,
  title={ How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users },
  author={ Alex Berke and Enrico Bacis and Badih Ghazi and Pritish Kamath and Ravi Kumar and Robin Lassonde and Pasin Manurangsi and Umar Syed },
  journal={arXiv preprint arXiv:2410.06954},
  year={ 2024 },
  url={https://arxiv.org/abs/2410.06954}
}

Read the full paper

Articles are CC BY 4.0 — feel free to quote with attribution