Skip to content

ResAware: Cross-Environment Website Fingerprinting via Resource-Privileged Distillation

Source: arXiv:2606.17462 · Published 2026-06-16 · By Chongru Fan, Wei Wang, Wentao Huang, Zhenquan Ding, Jinqiao Shi, Lei Cui et al.

TL;DR

This paper addresses the significant challenge of Website Fingerprinting (WF) attacks losing accuracy when deployed in real-world conditions due to distributional shifts, such as temporal changes, geographic variation, browser diversity, and proxy obfuscation. Existing WF methods rely solely on low-level traffic features which are noisy and brittle under these cross-environment perturbations. To overcome this, the authors propose ResAware, a novel cross-environment WF framework that leverages privileged resource-level information available only during offline training to distill stable knowledge into a traffic-only online classifier. ResAware employs a teacher-student distillation paradigm: a resource-based teacher model is trained on high-fidelity application-layer resource sequences in a controlled environment, then its knowledge is transferred to a student model operating solely on encrypted traffic via heterogeneous knowledge distillation. This learning-using-privileged-information setup enhances robustness by injecting stable semantic supervision that guides traffic feature learning without increasing online attack capabilities or costs.

They validate ResAware on a large dataset (>160,000 paired samples) collected over five months across six globally distributed vantage points covering 100 monitored sites and 83,645 background sites. Across diverse distribution shifts including 150-day temporal drift, geographic shifts, browser changes, and proxy obfuscations, ResAware consistently improves popular WF backbones such as Var-CNN, AWF, and Random Forest. For example, under 150-day drift, ResAware raises Var-CNN’s F1-score from 72.77% to 81.49%, and the open-world true positive rate at 1% false positive rate (TPR@1%FPR) from 22.40% to 27.20%. These results confirm the hypothesis that stable resource-side knowledge can be effectively distilled into traffic-only models to mitigate brittle environmental sensitivity without expanding the attack surface online. ResAware thus offers a practical, zero-overhead plug-and-play enhancement for improving cross-environment WF robustness.

Key findings

  • Resource-level features (categorical resource types and log-scaled resource sizes) exhibit 2.5×-3.1× higher cross-environment stability margins (CESM) than traffic burst features over temporal and spatial drift (Fig 3a).
  • Resource-only classifiers degrade 14.22 percentage points in F1 after 150 days of temporal drift, while traffic-only models degrade 33.30 points under the same drift (Fig 3b), showing resource features remain more robustly discriminative.
  • ResAware improves zero-shot cross-environment F1-scores of Var-CNN from 72.77% to 81.49% under 150-day temporal drift, an absolute gain of 8.72%.
  • Open-world TPR@1%FPR improves from 22.40% to 27.20% for Var-CNN with ResAware under the same temporal drift.
  • Under obfuscated proxy drift, ResAware delivers absolute F1-score gains of 8.96% for Var-CNN and 3.88% for RF, demonstrating resilience to transport-layer proxy obfuscations.
  • ResAware is effective across six distinct WF backbones and input encodings (packet direction, burst, traffic aggregation matrices), with source-validated distillation weights (α) ranging 0.1 to 0.7 tuned per backbone.
  • All knowledge distillation overhead is confined to offline training; online inference latency, input features, and memory footprint remain identical to native traffic-only WF models.
  • Ablation studies confirm that the KL-divergence based distillation loss effectively transfers resource-side inter-class similarity relations to the traffic student, mitigating shortcut learning.

Threat model

The adversary is a passive, standard WF attacker capable of offline data collection with privileged resource access via attacker-controlled crawlers and TLS key logging, but online inference access is restricted to passive observation of encrypted traffic only (packet length, timing, direction). The attacker cannot decrypt, inject, modify, or drop packets, and lacks endpoint control or access to side-channel metadata like DNS queries, TLS SNI, or host headers. The attacker targets isolated page-load events and faces environmental drift but cannot widen their online observation capabilities during deployment.

Methodology — deep read

The core methodology involves an asymmetric threat and data setting termed training-rich / inference-poor. The attacker assumes offline access to both encrypted traffic captures and application-layer resource information extracted from controlled crawlers via TLS key logging (privileged information), but during online inference can only passively observe encrypted traffic with no access to resource-level data or decryption capabilities.

The dataset contains over 160,000 paired samples (traffic trace, resource sequence, website label) collected Nov 2025 – Apr 2026 from six geographically distinct vantage points (US, JP, SG, ZA, AU, DE) on 100 monitored websites plus 83k background sites. Training data uses 150 samples per monitored site from the source domain; target domain test sets have 25–30 samples per site per snapshot. Unmonitored background sites are used to evaluate open-world detection at 1% FPR.

Resource sequences are ordered by browser-request initiation time and truncated/padded to fixed length 200. Each resource event encodes a categorical content type (9 categories, e.g. HTML, CSS, JS, Tiny/Regular images, JSON/API) plus a log-scaled byte size. Absolute timestamps are discarded to improve robustness.

The resource teacher model is a Transformer encoder with positional embeddings trained on resource sequences alone using standard cross-entropy classification loss on the source domain data. After convergence, the teacher is frozen.

The traffic student model is any standard WF backbone that takes encrypted traffic features such as packet direction sequences, burst sequences, or aggregated feature vectors. The student is trained jointly on two losses: cross-entropy classification on ground-truth labels, and KL-divergence between its softened logits and those of the frozen resource teacher, weighted by a hyperparameter α. Temperature for softmax smoothing is tuned once on source validation.

The joint objective guides the student to mimic the resource modality's soft inter-class similarity relationships, acting as semantic regularization and preventing overfitting to brittle traffic artifacts.

After training, all resource-side inputs, parsers, and the teacher model are removed for deployment. The student operates directly on encrypted traffic with no change to input format or inference cost.

Evaluation considers five deployment shifts: temporal (∼30-day intervals over 5 months), spatial (testing on geographically distant vantage points), obfuscated proxy (six transport-layer proxy protocols), browser engine changes (Chrome training, Edge and Firefox testing), and open-world temporal combined with many unmonitored sites.

Metrics include closed-world F1-score, and open-world true positive rate at 1% false positive rate (TPR@1%FPR). Multiple WF backbones are evaluated including AWF, DF, Random Forest, Var-CNN, Tik-Tok, CountMamba with distillation weight α optimized per model. Experiments run five times with distinct random seeds.

No packet-to-object reconstruction is done; instead, paired traffic-resource page-load level samples suffice. Distillation hyperparameters (α and temperature) are tuned only on source domain and fixed.

All code and dataset status is not explicitly stated; reproducibility details include random seeds, hardware specs (RTX 4090 GPU), and consistent baseline hyperparameters matching original papers. However, full code/dataset release is unclear.

A concrete example: For a given site, the resource-side teacher sees the resource category and size sequence (e.g. HTML, CSS, JS etc), embedded and processed by Transformer layers to produce logits. The frozen teacher’s softmax output over classes provides rich multi-class similarity. The traffic student gets the encrypted packet direction sequence for the same load, processes with its own architecture (e.g. Var-CNN CNN layers), and is trained to minimize classification loss plus KL divergence against teacher soft targets. This encourages the student’s traffic embeddings to align with stable resource-based class distinctions, improving robustness to environment shifts at test time when only encrypted traffic is available.

Evaluation validates substantial improvement in cross-environment stability and classification metrics with zero inference overhead.

Technical innovations

  • Formalization of a training-rich / inference-poor asymmetric threat model for WF that leverages privileged resource information available only offline.
  • Design of a cross-modal heterogeneous knowledge distillation framework (ResAware) that transfers stable resource-level semantic supervision into traffic-only student models.
  • Use of browser-request initiation ordering for resource sequence construction to decouple resource features from volatile transport-layer timing variations.
  • Plug-and-play integration approach allowing ResAware to augment any existing WF backbone without architectural modification or added online inference cost.

Datasets

  • Paired Traffic-Resource WF Dataset — >160,000 samples — collected Nov 2025 to Apr 2026 from six global vantage points on 100 monitored sites plus 83,645 unmonitored sites — proprietary / not publicly released

Baselines vs proposed

  • Var-CNN native: F1-score under 150-day temporal drift = 72.77% vs ResAware-distilled Var-CNN: 81.49%
  • Var-CNN native: open-world TPR@1%FPR under 150-day temporal drift = 22.40% vs ResAware: 27.20%
  • Var-CNN native: F1-score under obfuscated proxy drift = baseline vs ResAware gain = +8.96%
  • Random Forest native: F1-score under obfuscated proxy drift = baseline vs ResAware gain = +3.88%
  • Resource-only model after 150-day drift: F1 = 83.50% vs traffic-only model: 64.85%, confirming resource modality robustness.
  • Across six WF backbones, ResAware improves zero-shot cross-environment F1 by 3-9 percentage points depending on model and setting.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.17462.

Fig 1

Fig 1: A website’s identity is reflected in its architecture

Fig 2

Fig 2: The training-rich / inference-poor asymmetric

Fig 3

Fig 3: (a) CESM comparison between resource features

Fig 4

Fig 4: Overview of the ResAware framework. Offline training first trains a resource-only teacher, then distills its knowledge

Fig 5

Fig 5 (page 3).

Fig 6

Fig 6 (page 3).

Fig 7

Fig 7 (page 3).

Fig 8

Fig 8 (page 3).

Limitations

  • Resource-level privileged information requires offline collection with controlled crawlers and TLS key logging, limiting applicability to attackers with such capabilities.
  • The large-scale paired dataset is proprietary and not publicly released, affecting reproducibility and external validation.
  • No direct evaluation under fully adversarial settings such as active manipulation or adaptive countermeasures.
  • The method assumes static website resource structures and may degrade against sites with high dynamic content or frequent A/B testing.
  • The approach relies on no side-channel metadata or endpoint compromise; stronger adversaries might gain additional signals not modeled.
  • Evaluation focuses on page-load level isolated events, not continuous browsing sessions or multi-tab interference effects.

Open questions / follow-ons

  • Can resource-privileged distillation techniques be extended to support multi-tab or multi-session WF scenarios with temporal correlations?
  • How resilient is ResAware to active adversarial website countermeasures that dynamically alter resource loading sequences?
  • Can lightweight approximations of resource-level features be inferred solely from encrypted traffic to reduce reliance on privileged training data?
  • Would additional modalities (e.g. timing side channels, DNS features) combined with privileged distillation further improve cross-environment robustness?

Why it matters for bot defense

Bot-defense engineers focused on CAPTCHA and web privacy can apply the insights from ResAware to understand that leveraging privileged auxiliary information during offline training—such as stable application-layer resource patterns—can enhance robustness of classifiers under varying environmental shifts. While online inference restrictions limit resource access, distillation from resource-aware teacher models enables traffic-only detectors to better generalize across browsers, proxies, and geographies without expanding the online attack surface. This approach suggests that richer offline supervision signals can guide more robust encrypted traffic classifiers, potentially informing designs for bot detection or CAPTCHA bypass resilience. Practitioners must note that collection of privileged training data requires substantial infrastructure and may not be feasible for all adversaries, but the conceptual framework encourages exploring heterogeneous supervision to stabilize learning under real-world network variability.

Cite

bibtex
@article{arxiv2606_17462,
  title={ ResAware: Cross-Environment Website Fingerprinting via Resource-Privileged Distillation },
  author={ Chongru Fan and Wei Wang and Wentao Huang and Zhenquan Ding and Jinqiao Shi and Lei Cui and Zhiyu Hao and Xiaochun Yun },
  journal={arXiv preprint arXiv:2606.17462},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.17462}
}

Read the full paper

Articles are CC BY 4.0 — feel free to quote with attribution