Skip to content

The WASM Cloak: Evaluating Browser Fingerprinting Defenses Under WebAssembly based Obfuscation

Source: arXiv:2508.21219 · Published 2025-08-28 · By A H M Nazmus Sakib, Mahsin Bin Akram, Joseph Spracklen, Sahan Kalutarage, Raveen Wijewickrama, Igor Bilogrevic et al.

TL;DR

This work investigates how WebAssembly (WASM) based obfuscation affects the detection of browser fingerprinting scripts. Traditional defenses and detection approaches overwhelmingly focus on JavaScript (JS) code analysis, leaving a potential blind spot as adversaries can translate JS fingerprinting scripts into WASM. The authors develop an automated pipeline that greedily converts real-world JS fingerprinting scripts into functionally equivalent WASM-obfuscated variants, enabling a comprehensive evaluation of how modern fingerprinting detectors handle such transformations. They evaluate both state-of-the-art research detectors relying on static and dynamic code features as well as commercial defenses including browser extensions and built-in browser protections.

Their results reveal a significant disparity: research literature detectors based on JS source code features often degrade substantially or become unusable when faced with WASM-obfuscated scripts, mainly due to outdated toolchains or lack of WASM compatibility. While retraining on updated data recovers some effectiveness, these detectors remain fragile. Conversely, commercial defenses that operate at the browser API interception level remain robust and unaffected by the switch to WASM, as their detection operates independently of script implementation details. Native WASM fingerprinting techniques largely bypass detection unless script or WASM execution is outright blocked. These findings not only expose a critical gap between academic detection strategies and practical deployed defenses but also offer insights on incorporating WASM-awareness to strengthen research detectors and preempt evolving evasion tactics.

Key findings

  • Built a large-scale corpus of 7,578,653 JS scripts from Google CrUX Top-1M (May 2025), identifying 10,742 fingerprinting scripts spanning AudioContext (8730), Canvas (1780), WebRTC (1986), and Canvas-font (109) categories.
  • Developed 13 transformation rules to translate JS fingerprinting scripts into semantically equivalent WASM variants using AssemblyScript and LLM-assisted function translation.
  • State-of-the-art detectors from literature relying on JS source code features show significant performance degradation or fail to support WASM, with some becoming unusable until retrained on updated datasets including WASM-obfuscated scripts.
  • Commercial defenses (browser extensions and native browser features) using API-level interception remain fully effective against WASM-obfuscated scripts due to implementation-agnostic detection.
  • Native WASM fingerprinting, evaluated with only one known technique, evades all detectors unless script/WASM execution is disabled, highlighting limitations in current defenses for this vector.
  • Greedy WASM conversion (translating maximal JS code) successfully preserves fingerprinting functionality, enabling stress-testing detectors with obfuscated scripts.
  • LLM-based translation for complex JS functions to AssemblyScript improves accuracy and coverage beyond deterministic rule-based methods.
  • Detection failures are primarily caused by reliance on JS syntactic/semantic cues absent in WASM and outdated training data lacking WASM samples.

Threat model

The threat model assumes a web-based adversary who controls the delivery of client-side code on websites or third-party domains. They aim to perform browser fingerprinting by extracting device and browser attributes covertly while evading detection by existing fingerprinting defenses. The adversary employs a greedy transformation of JS fingerprinting scripts into WASM to obfuscate implementation details but cannot escape browser sandbox constraints or alter browser internals. They lack knowledge of the exact internal models or parameters of deployed defenders, operating under a black-box setting. The user may employ static or dynamic fingerprinting detectors, but these detectors are not assumed to have WASM-aware heuristics or retraining initially.

Methodology — deep read

  1. Threat Model & Assumptions: The adversary is a web-based fingerprinting entity controlling a server or third-party domain that delivers client-side code. The attacker aims to extract device/browser attributes by converting JS fingerprinting scripts into WASM obfuscated variants while preserving functional equivalence. They apply a greedy strategy to translate as much JS code as possible into WASM to maximize obfuscation. The adversary cannot escape the JS/WASM sandbox or modify the browser engine itself. The attacker does not know the internal logic of fingerprinting detectors (black-box setting).

  2. Data: Two datasets are used:

    • A large-scale real-world JS corpus crawled from May 2025 snapshot of Google CrUX Top-1M, comprising 7,578,653 script files, with 10,742 identified fingerprinting scripts categorized by AudioContext, Canvas, WebRTC, and Canvas-font.
    • A controlled dataset of 124 paired JS and WASM fingerprinting scripts covering various fingerprinting categories for functional verification.
  3. Architecture / Algorithm: Developed 13 conversion rules for the greedy JS-to-WASM translation pipeline that target general language constructs (literals, control flow, arrays, functions) and fingerprinting-specific patterns (property accesses to canvas, navigator, screen, dynamic code generation). Complex function declarations are translated using a Large Language Model (Qwen2.5-Coder-14B-Instruct) to AssemblyScript to handle JS dynamism and typing.

The pipeline parses JS code to AST form and recursively applies pattern-matching rules producing AssemblyScript snippets, import object entries, and JS glue code for runtime bridging. These snippets are merged and compiled into a single WASM binary instantiated asynchronously in the browser. The original JS is modified to call WASM exports with helper functions providing string decoding and runtime linking.

  1. Training Regime: For ML-based detectors, retraining was performed with updated data including WASM-obfuscated scripts to assess recovery of detection capability. Details of epochs, batch sizes, and hyperparameters were not explicitly stated.

  2. Evaluation Protocol: Tested state-of-the-art JS-based fingerprinting detectors from research literature as well as commercial and native browser defenses on:

    • Original JS fingerprinting scripts.
    • WASM-obfuscated variants of those scripts.
    • Native WASM fingerprinting scripts from prior literature.

Metrics included detection accuracy, false positives/negatives, and qualitative assessment of detector robustness and failure modes. Ablations explored the impact of specific conversion rules on detection evasion. Cross-validation or statistical significance tests were not detailed.

  1. Reproducibility: Dataset and conversion pipeline details are extensively described. Code release plans were not specified, and some datasets derived from large-scale web crawling may not be publicly available. Function translations via the specified LLM are reproducible given model access.

Technical innovations

  • A comprehensive automated pipeline implementing 13 novel translation rules to greedily convert diverse JS fingerprinting scripts into semantically equivalent WASM-obfuscated variants.
  • Use of a Large Language Model to translate complex JS function declarations and expressions into AssemblyScript for WASM generation, overcoming syntactic and semantic challenges.
  • Systematic large-scale evaluation of both research and commercial fingerprinting defenses against WASM-obfuscated and native WASM fingerprinting scripts, exposing significant detection gaps.
  • Insight that API-level interception based defenses remain robust to WASM obfuscation, unlike source-code-feature-based detectors reliant on JS syntax and semantics.

Datasets

  • Google CrUX Top-1M JS script corpus — 7,578,653 scripts (May 2025 crawl) — private crawl dataset
  • Fingerprinting scripts subset — 10,742 scripts labeled with categories AudioContext (8,730), Canvas (1,780), WebRTC (1,986), Canvas-font (109)
  • Controlled paired JS and WASM fingerprinting dataset — 124 script pairs — internally constructed for functional verification

Baselines vs proposed

  • DeepFPD (SOTA JS-based detector): Detection accuracy drops significantly on WASM-obfuscated scripts; retraining partially restores effectiveness.
  • FP-Inspector (hybrid static-dynamic ML): Performance degrades on WASM-obfuscated scripts due to missing WASM support; improved after dataset update.
  • Commercial browser extensions (Privacy Badger, Disconnect): Detection effectiveness remains stable (~100%) on both JS and WASM obfuscated scripts.
  • Native browser fingerprinting defenses (e.g., Firefox, Brave built-in): Maintain near-perfect detection on WASM-obfuscated scripts due to API-level monitoring.
  • Native WASM fingerprinting technique (Guri et al. [15]): Bypasses all tested detectors except when script or WASM execution is disabled entirely.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2508.21219.

Fig 1

Fig 1: Adversary model.

Fig 2

Fig 2: Conversion pipeline.

Fig 3

Fig 3 (page 3).

Fig 4

Fig 4 (page 3).

Fig 5

Fig 5 (page 3).

Fig 6

Fig 6 (page 5).

Fig 7

Fig 7: Change in DeepFPD feature contributions for repre-

Limitations

  • Only one known native WASM fingerprinting technique was evaluated, limiting conclusions about native WASM fingerprinting defenses.
  • The large-scale real-world fingerprinting labels rely on heuristic matching and prior detectors, which may miss obfuscated or novel fingerprinting instances.
  • Retraining details for ML detectors are not fully disclosed, limiting reproducibility of recovery performance claims.
  • Conversion pipeline excludes class methods and advanced JS semantics such as closures, potentially missing subtler obfuscation patterns.
  • Evaluation does not cover adversarial adaptation beyond greedy JS-to-WASM conversion or combined multi-obfuscation strategies.
  • No explicit testing under distribution shift or with fully black-box deployed detectors; the black-box assumption limits adversary knowledge exploration.

Open questions / follow-ons

  • How can fingerprinting detectors effectively incorporate WASM-aware features or dynamic analysis techniques to maintain robustness against WASM obfuscation?
  • Can native WASM fingerprinting approaches evolve to leverage more diverse hardware or browser interaction side-channels, and how can defenses adapt?
  • What combined obfuscation techniques beyond greedy JS-to-WASM translation (e.g., polymorphism, multi-layered obfuscation) further challenge detection and how to address them?
  • What is the performance impact and user experience tradeoff when deploying API-level interception defenses at scale to combat evolving fingerprinting obfuscation?

Why it matters for bot defense

Bot-defense engineers focusing on CAPTCHA and fingerprinting mitigation should note that WASM-based obfuscation poses a meaningful evasion risk to detection approaches reliant on static or syntactic analysis of JavaScript code. This study highlights that relying solely on JS code features for fingerprinting detection is fragile as attackers shift to WASM obfuscation, which breaks many heuristics. Conversely, defenses that operate by monitoring API calls themselves (such as browser extensions or built-in browser protection features) remain robust despite obfuscation. Thus, bot-defense practitioners should prioritize detection mechanisms that observe API-access patterns or browser behavior rather than static script analysis alone. Moreover, the findings suggest that retraining ML detection models with WASM-obfuscated samples is necessary but not sufficient — fundamentally WASM-aware analysis techniques are needed to stay ahead of attackers. Finally, given the increased use of WASM for both legitimate and adversarial purposes, bot defenses that operate at the network or browser engine levels and restrict or monitor WASM execution permissions may offer valuable complementary safeguards.

Cite

bibtex
@article{arxiv2508_21219,
  title={ The WASM Cloak: Evaluating Browser Fingerprinting Defenses Under WebAssembly based Obfuscation },
  author={ A H M Nazmus Sakib and Mahsin Bin Akram and Joseph Spracklen and Sahan Kalutarage and Raveen Wijewickrama and Igor Bilogrevic and Murtuza Jadliwala },
  journal={arXiv preprint arXiv:2508.21219},
  year={ 2025 },
  url={https://arxiv.org/abs/2508.21219}
}

Read the full paper

Articles are CC BY 4.0 — feel free to quote with attribution