ActiveFlowMark: Assessing Tor Anonymity under Active Bandwidth Watermarking

Source: arXiv:2605.05887 · Published 2026-05-07 · By Zilve Fan, Zijian Zhang, Yangnan Guo, Jiaqi Gao, Zhen Li, Mengyu Wang et al.

TL;DR

This paper presents ActiveFlowMark (NATA — Non-invasive Active Traffic-correlation Analysis), an infrastructure-level active watermarking attack against Tor that does not require endpoint compromise, payload decryption, or Tor browser modification. The adversary operates from two network vantage points: a 'Shaper' that imposes controlled low-frequency bandwidth modulation patterns (sinusoidal, square-wave, triangular) on Tor TCP connections at an upstream gateway or ISP link, and a 'Sniffer' that passively records packet-level observations at adversary-controlled exit relays. The core insight is that these macroscopic throughput perturbations, injected via a token-bucket rate limiter, survive Tor's onion routing, multiplexing, and TCP dynamics well enough to be detected at the exit side — unlike fine-grained inter-packet timing features which are more easily disrupted by network jitter and padding defenses.

The detection and classification problem is handled by BM-Net (Bandwidth Modulation Network), a selective state-space model (SSM) built around Mamba-style input-dependent recurrence. Because collecting large volumes of labeled, multi-class, cross-continental Tor traces is expensive, BM-Net uses a two-stage training pipeline: self-supervised masked pre-training on unlabeled serialized traffic traces to learn structural flow representations, followed by supervised fine-tuning on a smaller labeled dataset for binary perturbation detection and four-class modulation classification. Flows are represented as fixed-length bit-level serializations of raw packets (headers plus truncated encrypted payload bytes), divided into stride tokens and projected into a latent space with learnable positional embeddings.

The paper also contributes a probabilistic model decomposing end-to-end correlation success probability into three factors: exit-relay observation probability (estimated via 1%-scaled tornettools simulations with historical Tor consensus data), binary detection probability (p1), and per-class modulation classification probability (p2,i). Real-world measurements show BM-Net achieves 99.65% F1 on binary detection and 97.5% macro-F1 on four-class modulation classification. The tornettools simulations quantify how adversarial exit bandwidth share and number of monitored flows affect overall correlation risk, providing a network-scale risk estimate rather than just a laboratory benchmark.

Key findings

BM-Net achieves a 99.65% F1 score on binary perturbation detection (perturbed vs. natural traffic) on real-world cross-continental Tor measurements.
BM-Net achieves a 97.5% macro-F1 score on fine-grained four-class modulation classification (natural, sinusoidal, square-wave, triangular) on a 'curated small-sample multi-class dataset' — exact sample count not fully specified in the truncated text but described as limited due to collection difficulty.
The modulation dictionary uses three active waveform types (sinusoidal: rbase + A·sin(2πfmod·t + φ); square-wave: alternating rhigh/rlow; triangular: rbase + 2A/π·arcsin(sin(2πfmod·t + φ))) applied via a programmable token-bucket shaper with a minimum rate floor rmin to avoid Tor SENDME-induced circuit stalls.
Exit-relay observation probability pexit(n) is estimated via a 1%-scaled tornettools simulation using historical Tor consensus bandwidth data; the model shows pexit scales with adversarial exit bandwidth share under Tor's bandwidth-weighted path selection, and that repeated observation over T windows compounds success probability as 1−(1−Pcorr)^T.
The attack requires no endpoint compromise, no Tor browser modification, no script injection, and no payload decryption — operating purely at the TCP connection level using flow-level metadata (client IP, relay IP, port, protocol) to identify Tor connections for shaping.
The two-stage BM-Net training (masked pre-training then supervised fine-tuning) is motivated by the scarcity of high-fidelity labeled multi-class Tor traces; the design explicitly decouples representation learning from task-specific classification to reduce labeled data requirements.
Logical-layer padding defenses (e.g., Obfs4) are argued to be insufficient against active bandwidth-constraint watermarking because the perturbation is imposed as a rate limit on throughput, not as a packet-timing or payload modification — though this claim is evaluated empirically to some degree in Section VI-E (robustness under client-side defenses), details of which are truncated.
The confusion matrix for modulation classification (Fig referenced in captions as '201-' — likely 201-sample test set, though exact split is unclear from truncated text) is presented but specific per-class precision/recall breakdowns are not fully recoverable from the truncated source.

Threat model

The adversary is an infrastructure-level actor (ISP, upstream AS, or nation-state gateway operator) controlling two network vantage points simultaneously. The Shaper controls a network gateway near the target Tor client and can apply rate-limiting bandwidth modulation to Tor TCP connections identified via flow-level metadata (IP addresses, ports, transport protocol) — without accessing the client host, Tor process, browser state, or packet payloads. The Sniffer operates one or more Tor exit relays and passively records packet-level observations (timestamps, sizes, directions, raw encrypted bytes) from traversing flows without decrypting payloads. The adversary cannot: compromise the client endpoint, modify the Tor browser or client software, inject application-layer scripts, decrypt onion-encrypted payloads, or directly observe individual Tor circuits (only the TCP connection level). For bridge/pluggable-transport users, the adversary would require additional assumptions to identify Tor traffic for shaping. The attack operates at TCP connection granularity and requires the same adversary to control both vantage points simultaneously, which is a significant but realistic capability for nation-state ISP-level actors. The adversary's goal is traffic correlation: linking a specific client-side Tor connection to its corresponding exit-side flow.

Methodology — deep read

Threat Model and Assumptions. The adversary is modeled as an infrastructure-level actor controlling two vantage points: (1) a Shaper — an upstream network gateway, AS, or ISP link near the Tor client — capable of applying rate limits to Tor TCP connections identified via flow-level metadata (IP addresses, ports, protocol); and (2) a Sniffer — one or more adversary-operated Tor exit relays passively recording packet-level observations (timestamps, sizes, directions, raw encrypted bytes). The adversary does not compromise the client host, modify the Tor browser, inject scripts, or decrypt payloads. The attack operates at TCP connection granularity, not circuit granularity, since Tor multiplexes circuits over a single TCP connection. For bridge users, pluggable transport users, or VPN users, additional identification assumptions are required and are acknowledged as limitations. This is a global passive adversary variant with active shaping capability — a stronger-than-typical but realistic ISP/nation-state adversary model.

Data Collection. Traffic data is collected from real-world Tor measurements described as 'cross-continental' paths, reflecting genuine network variability (relay congestion, jitter, TCP dynamics, SENDME flow control). The labeled dataset for fine-grained classification is described as a 'curated small-sample multi-class dataset' — the exact size is not fully specified in the available truncated text, though the confusion matrix caption references '201-' which likely indicates approximately 201 test samples. The unlabeled pre-training corpus is larger (serialized traffic traces without modulation labels). Data provenance is not fully described: it appears to be collected by the authors by running Tor clients through their own shaping infrastructure and observing at self-operated exit relays. Label types: binary (perturbed vs. natural) and four-class (natural, sinusoidal, square-wave, triangular). Train/validation/test splits are not explicitly stated in the available text. No public dataset release is confirmed.

Flow Representation (Phase II). Each captured bidirectional flow is serialized as a fixed-length sequence of the first M packets, with data-link headers stripped and network/transport headers retained alongside a truncated segment of encrypted payload bytes (used as raw byte patterns, not decrypted). Each packet's bytes are expanded to a bit-level binary sequence, then the full flow is arranged into a 2D binary representation. This is divided into non-overlapping strides of length Ls, forming a token sequence S = {s1,...,sN} where si ∈ {0,1}^Ls. Each stride is linearly projected to Dmodel dimensions with learnable positional embeddings added. Specific values of M, Ls, and Dmodel are not recoverable from the truncated text.

BM-Net Architecture (Phase III). The core encoder is a selective state-space model (SSM) inspired by Mamba. The latent state dynamics follow a continuous-time linear system h'(t) = Ah(t) + Bx(t), y(t) = Ch(t), discretized per-token via zero-order hold with token-dependent timescale ∆t. The 'selective' mechanism makes B, C, and ∆t input-dependent: Bt = LinearB(xt), Ct = LinearC(xt), ∆t = softplus(Linear∆(xt)). This allows the model to adaptively weight tokens informative for macroscopic modulation patterns versus short-term noise. The motivation for SSMs over Transformers is linear-time recurrence for long sequences and explicit modeling of long-range dependencies — appropriate for low-frequency throughput patterns embedded in packet-level time series. Specific layer counts, hidden dimensions, and parameter totals are not specified in the available text.

Two-Stage Training (Phase IV). Stage I: Self-supervised masked pre-training on unlabeled serialized traffic traces. A large fraction of input tokens is masked and the model learns to reconstruct or predict structural information from context — analogous to BERT-style masked language modeling applied to packet byte sequences. No modulation labels are needed. Stage II: Supervised fine-tuning attaches a classification head outputting logits z ∈ R^(K+1) (K=3 active modulation classes + natural class), optimized with categorical cross-entropy. For binary detection, the same framework is used with a two-class output. Specific masking ratios, learning rates, batch sizes, epoch counts, hardware, and random seed strategies are not reported in the available truncated text — a reproducibility concern.

Probabilistic Network-Scale Model (Section V). The paper constructs a first-order probability model: per-flow correlation success qi(n) = pexit(n) · p1 · p2,i, where pexit is estimated via tornettools simulation (1%-scaled Tor network from historical consensus data), p1 is the binary detection probability, and p2,i is the per-class classification probability. Over r monitored flows: P_corr = 1−(1−qi)^r. Mixed perturbation strategies and temporal aggregation over T windows are also modeled. The independence assumptions across windows and between the three probability factors are acknowledged as idealizations.

Evaluation Protocol. Primary metrics are F1 (binary) and macro-F1 (four-class). Baselines are compared in Section VI-D (specific baselines not fully recoverable from truncated text but appear to include passive correlation methods and possibly CNN/RNN classifiers). Robustness is evaluated in Section VI-E under 'challenging network conditions and client-side defenses,' including apparent testing against Obfs4 or padding-based defenses, though details are truncated. A confusion matrix is presented for modulation classification. The tornettools simulation provides network-scale exit observation probability estimates. Statistical significance tests are not mentioned. No held-out geographic distribution shift test is described beyond the real-world cross-continental collection setup.

Technical innovations

NATA introduces active bandwidth watermarking via a token-bucket shaper applying geometric waveforms (sinusoidal, square-wave, triangular) to Tor TCP connections at the gateway level — distinct from prior active watermarking work (e.g., [5]–[8]) which modifies inter-packet timing rather than imposing macroscopic throughput rate constraints.
BM-Net applies a selective state-space model (Mamba-style input-dependent SSM with per-token B, C, ∆t parameters) to packet-level traffic traces for bandwidth-modulation detection, enabling linear-time modeling of long-range throughput dependencies rather than quadratic-attention Transformers.
The two-stage training pipeline (self-supervised masked pre-training on unlabeled serialized traffic traces, then supervised fine-tuning with small labeled modulation datasets) adapts BERT-style representation learning to the traffic-analysis domain specifically to address the scarcity of labeled cross-continental Tor traces.
A closed-form probabilistic model decomposes end-to-end Tor traffic-correlation risk into three measurable factors (exit observation probability, binary detection probability, classification probability) and integrates them with tornettools-based network simulation to produce network-scale risk estimates.
Flow representation as stride-tokenized bit-level serializations of raw encrypted packet bytes (headers + truncated payload) preserves packet order and byte structure without requiring payload decryption, enabling the SSM encoder to learn structural patterns in encrypted Tor traffic.

Datasets

Real-world Tor traffic measurements (authors' own collection) — size unclear from truncated text, described as 'small-sample' for multi-class; ~201 test samples inferred from confusion matrix caption — non-public, self-collected cross-continental Tor traces
Unlabeled Tor traffic traces for self-supervised pre-training — size unspecified — non-public, self-collected
tornettools 1%-scaled Tor network simulation — synthetic, derived from historical Tor consensus bandwidth data — non-public simulation output

Baselines vs proposed

BM-Net binary detection: F1 = 99.65% (specific baselines and their F1 scores not recoverable from truncated text)
BM-Net four-class modulation classification: macro-F1 = 97.5% (specific baselines and their macro-F1 scores not recoverable from truncated text)
Passive correlation methods (DeepCorr, DeepCoFFEA, FlowTracker, AttCorr referenced in related work): not directly benchmarked against NATA in the available text — these are positioned as the prior paradigm, not ablation baselines

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.05887.

Fig 1

Fig 1: Operational threat model. The adversary shapes Tor-related traffic near the client side and passively observes traffic at adversary-controlled exit relays.

Fig 2

Fig 2: Overview of the NATA pipeline. The system operates in four phases: (I) active bandwidth shaping near the client-side gateway, (II) fixed-length

Fig 3

Fig 3: visualizes the throughput signatures of these mod-

Fig 4

Fig 4: Feature analysis of the rolling average of inter-arrival times. The plot

Fig 5

Fig 5: provides additional intuition. The watermarked

Fig 6

Fig 6: Confusion matrix counts for modulation classification on the 201-

Limitations

The labeled dataset for multi-class evaluation is explicitly described as 'small-sample' and the exact size is unclear — a confusion matrix caption suggests approximately 201 test samples, which is insufficient to establish statistical reliability across diverse network conditions and path configurations.
The independence assumptions in the probabilistic model (between pexit, p1, and p2,i, and across temporal observation windows) are acknowledged as idealizations — in practice, successive windows share circuit state, relay congestion, and classification errors, meaning the model likely overestimates cumulative correlation probability.
The attack is evaluated against standard Tor connections; for bridge users, pluggable transports (Obfs4, meek), VPNs, or other tunneling configurations, the adversary requires additional traffic identification assumptions that are noted as limitations but not evaluated.
The rmin (minimum shaping rate) parameter is determined empirically rather than analytically — the authors acknowledge it varies with relay load, path length, TCP congestion control, and background conditions, raising questions about generalizability across different network environments and Tor path configurations.
Key hyperparameters (M packets per flow, stride length Ls, model dimensions, masking ratio, learning rate, batch size, hardware, training epochs, random seeds) are not reported in the available text, making independent reproducibility difficult; no code or frozen weights release is confirmed.
The tornettools simulation uses a 1%-scaled network, which may not faithfully capture emergent bandwidth-weighted path selection properties at full Tor scale, particularly for tail-distribution relay bandwidth configurations.
No evaluation of countermeasures specifically designed against active bandwidth watermarking (e.g., client-side rate smoothing, traffic reshaping at guard relay) is presented — Section VI-E evaluates robustness against existing defenses but the specific defenses tested and results are truncated.

Open questions / follow-ons

Can client-side or guard-relay-side rate smoothing (e.g., a Tor modification that normalizes outbound throughput to a constant rate) defeat the watermark while remaining within acceptable latency and bandwidth overhead bounds for low-latency Tor use cases?
How does BM-Net performance degrade under distribution shift — specifically when the modulation parameters (fmod, A, rbase) used during deployment differ from those seen during training, or when the adversary uses novel waveform geometries not in the training dictionary?
What is the minimum adversarial exit bandwidth share (pexit) required to make the overall correlation probability practically significant for a realistic Tor user population, and how does this threshold change under Tor's path selection defenses like Counter-RAPTOR or AS-aware relay selection?
Does the attack remain feasible when Tor traffic is tunneled through a VPN or uses meek/domain-fronted pluggable transports that obscure the TCP connection's association with Tor, and can traffic fingerprinting be used to recover the Tor identification step without additional adversary assumptions?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this paper is primarily relevant as a threat-intelligence data point about the limitations of Tor-based anonymity for adversarial bot operators. Sophisticated bots that route traffic through Tor to evade IP-reputation and geolocation-based detection could, in principle, be subject to this kind of infrastructure-level deanonymization by a sufficiently capable network adversary — not by the CAPTCHA operator directly, but by the network infrastructure the bot operator depends on. The key takeaway for defenders is that Tor anonymity is not absolute even against non-endpoint adversaries, which has implications for how much weight to place on 'this IP is a Tor exit' as a definitive signal versus one input in a probabilistic risk model.

More directly, the BM-Net architecture and two-stage pre-training approach (masked self-supervised pre-training on serialized packet traces, followed by task-specific fine-tuning with small labeled datasets) is methodologically interesting for any traffic classification problem where labeled data is scarce — including bot traffic detection from network-level features. The stride-tokenized bit-level flow serialization and selective SSM encoder offer a template for building data-efficient classifiers on encrypted traffic metadata without payload access. However, practitioners should note the small evaluation dataset, unclear reproducibility details, and the fact that the attack requires simultaneous control of both a client-side gateway and exit relays — a capability far outside typical bot-defense operator scope, making direct operational application of NATA itself unlikely for a commercial CAPTCHA/bot-defense context.

Cite

bibtex

@article{arxiv2605_05887,
  title={ ActiveFlowMark: Assessing Tor Anonymity under Active Bandwidth Watermarking },
  author={ Zilve Fan and Zijian Zhang and Yangnan Guo and Jiaqi Gao and Zhen Li and Mengyu Wang and Chengxiang Si and Liehuang Zhu },
  journal={arXiv preprint arXiv:2605.05887},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.05887}
}

ActiveFlowMark: Assessing Tor Anonymity under Active Bandwidth Watermarking ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​