Adversarial Robustness on Insertion-Deletion Streams

Source: arXiv:2602.20854 · Published 2026-02-24 · By Elena Gribelyuk, Honghao Lin, David P. Woodruff, Huacheng Yu, Samson Zhou

TL;DR

This paper attacks a long-standing conjecture about adversarial robustness in turnstile (insertion-deletion) streams: that once the stream is adaptive and long enough, any robust algorithm must use linear-in-n space. The authors refute that conjecture by building non-linear robust algorithms that still use sublinear space, even for polynomial-length streams. The core idea is to stop relying on a single fixed linear sketch from the start of the stream, and instead interleave sketches with an estimator-corrector-learner loop that can “learn” the hard part of the prefix when an estimate is wrong.

The main concrete wins are a fully robust (1+ε)-approximation for F2 in poly(log n, 1/ε) space, a robust heavy-hitters algorithm for L2, and a general reduction for symmetric functions with an approximate triangle inequality. The framework is recursive over a B-ary stream decomposition, so each sketch only needs to survive a bounded number of adaptive interactions. For F2, they claim optimality up to poly((log n)/ε) factors, and they emphasize that this creates an exponential separation between linear sketches and non-linear methods for adversarial robustness in turnstile streams.

Key findings

Theorem 1.3: adversarially robust insertion-deletion F2 estimation achieves a (1+ε)-approximation at all times using poly(1/ε, log n) bits for streams of length m = poly(n).
Theorem 1.4: adversarially robust L2 heavy hitters are recoverable at all times in poly(1/ε, log n) bits on length-m = poly(n) turnstile streams.
Theorem 1.5: any symmetric function with a β-approximate triangle inequality and a non-adaptive κ-approximate turnstile sketch can be made adversarially robust with ˜O(n^{1/C})·S(n) bits for any constant C > 1, at the cost of a 2^{O(C)}-type approximation blowup (stated as κ^{O(C)} / κ^{O(c)} in the text).
The paper states that for F2 the space bound is optimal up to poly((log n)/ε) factors, citing the non-adaptive lower bound of BZ25.
The authors claim their result overturns the conjecture that robust turnstile streaming with m = poly(n) needs Ω(n) space.
They argue that their algorithms are non-linear overall, even though they repeatedly use linear sketches internally; this is necessary to bypass the Ω(n) lower bounds for robust linear sketches from GLW+24/GLW+25.
The framework applies to L1, F0, p-norms for p < 1, and losses such as Huber and M-estimators, according to the theorem statement and introduction.

Threat model

The adversary is adaptive and potentially fully malicious: after each update, it observes the algorithm’s output and may choose the next update based on the whole transcript so far. The algorithm is randomized, single-pass, and must remain accurate on every prefix of a turnstile stream of length m = poly(n). The adversary cannot see the internal random seed directly, but may infer it indirectly through repeated outputs; the algorithm’s design explicitly limits how many informative interactions any one sketch can endure.

Methodology — deep read

Threat model and assumptions: the adversary is fully adaptive and can choose each next update based on the entire transcript of previous updates and all algorithm outputs. The model is turnstile/insertion-deletion, so each update is a signed change (at, Δt) to a coordinate of the implicit frequency vector x, with stream length m = poly(n) and update magnitudes bounded by poly(n). The algorithm must remain accurate at every prefix x(t), not just at the end. The paper explicitly notes that for lower bounds against linear sketches, the adversary’s power is to query the algorithm enough times to infer the initial random sketch and then tailor a bad future update sequence.

Data and problem setup: there is no dataset in the ML sense; the “data” is the update stream. For the main F2 result, the frequency vector x lives in R^n and the algorithm needs a (1+ε)-approximation to F2(x)=∑i xi^2 at all times. For the general theorem, the target is a symmetric function F on Z^n with an O(1)-approximate triangle inequality and a non-adaptive streaming sketch of space S(n). The text does not describe a supervised dataset, labels, or train/test split; evaluation is theoretical and worst-case over adaptive streams. Preprocessing is also theoretical: the stream is recursively partitioned into blocks in a B-ary tree, and each block is associated with its own sketch matrix. The authors repeatedly rely on the fact that m = poly(n), which lets them take H = O(log n) recursion depth and set failure probabilities small enough for union bounds over all possible adversarially induced transcripts.

Architecture / algorithm: the main technical device is an estimator-corrector-learner loop built around multiple sketches. The intuition is to split the stream into a known prefix z and a future suffix q, maintain a sketch of z from the past, and separately track q with a fresh sketch initialized later. The estimator tries to approximate F(z+q) by combining an estimate of the residual prefix error F(z−z′) with a sketch-based estimate of F(z′+q), where z′ is a learned approximation to z. The corrector uses an independent sketch to test whether the estimator is wrong; if so, the learner updates z′ in the direction of the current query q (for F2, z′′ = z′ + α(q+z′) with α chosen from sketch-estimated norms). The novelty is that the algorithm does not depend on a single sketch remaining hidden forever; instead it allows a bounded number of “corrections,” and each correction improves a progress measure such as ∥z−z′∥2^2. For F2, the paper gives a simplified two-sketch picture, then lifts it recursively: at each tree level, Pi estimates the error term from the current block and Qi recursively estimates the remainder. Algorithm 1 (EstLevel) formalizes this: it combines Pi and Qi, checks whether the sum is consistent with an auxiliary estimate Ai, and if not returns Ai and triggers an update to the learner.

Training regime / parameterization: there is no empirical training. The analogous “regime” is the choice of block size and error parameters. The paper sets the block size B on the order of O(1/ε^2 log n), because the learner can only make O(1/ε^2 log n) progress steps before z′ converges to z. The tree height is H = O(log n) for m = poly(n), so recursion depth stays logarithmic. Failure probabilities for the corrector sketches are set using a bounded-computation-paths style argument: because there are at most binomial(m, L) possible times at which the estimator can be wrong, the corrector’s δ is chosen small enough to union bound over all adversarially generated transcripts. The paper does not give a full explicit hardware/seed table because this is a theory paper; the only “hardware” analogue is the sketch dimension/space and the assumption of explicit random matrices for AMS-style sketches. The text mentions robust L2 sketches and bounded-computation-paths arguments to estimate step sizes α and to keep each sketch robust to only O(B^2) adaptive queries.

Evaluation protocol and example: evaluation is by theorem statements, proof sketches, and asymptotic comparisons against known lower bounds. For F2, the benchmark is the classical AMS algorithm, which uses O((1/ε^2) log n log(1/δ)) bits in the non-adaptive setting; the robust algorithm matches this up to poly((log n)/ε) factors. For heavy hitters, the baseline is the standard turnstile L2 heavy-hitter machinery, and the claim is that the robust version keeps the same poly(1/ε, log n) dependence. For the general function theorem, the baseline space is the non-robust turnstile sketch space S(n), and the robust overhead is ˜O(n^{1/C}). A concrete end-to-end example in the paper is the F2 estimator: suppose the stream is split into prefix z and suffix q. If the current learner iterate z′ is poor, the estimator uses ∥z−z′∥2^2 + ∥z′+q∥2^2 as a proxy for ∥z+q∥2^2. The corrector checks this proxy against an auxiliary sketch of the true sum. If the proxy is wrong by more than ε, the learner updates z′ toward −q, and the distance to z shrinks by a multiplicative factor (the text derives a (1−ε^2)-type decrease). Repeating this only O((1/ε^2) log n) times suffices because ∥z∥2^2 is polynomially bounded when m = poly(n). Reproducibility is theoretical rather than empirical: the paper provides explicit algorithms and theorem/proof structure, but the excerpt does not mention released code, frozen weights, or datasets, and no experimental section is present in the supplied text.

Technical innovations

An estimator-corrector-learner framework that lets a stream algorithm use fresh randomness mid-stream, instead of relying entirely on one initial sketch.
A recursive B-ary block decomposition that limits each sketch to only O(B^2) adaptive interactions, enabling robustness without Ω(n) space.
A robust F2 estimator that remains (1+ε)-accurate at every prefix of an adaptive turnstile stream in poly(1/ε, log n) space.
A generic reduction from non-adaptive sketches for symmetric functions with an approximate triangle inequality to adversarially robust sketches with sublinear-in-n overhead.
A robust heavy-hitters construction for L2, with the Lp-heavy-hitter corollary for all p ≤ 2.

Baselines vs proposed

AMS99 F2 sketch: space = O((1/ε^2) log n log(1/δ)) bits vs proposed: poly(1/ε, log n) bits under adaptive turnstile streams.
Classical turnstile heavy-hitters: space = poly(1/ε, log n) bits vs proposed: poly(1/ε, log n) bits under adaptive turnstile streams (same asymptotic dependence stated).
Non-robust sketch for symmetric F: space = S(n) bits vs proposed: ˜O(n^{1/C})·S(n) bits for any constant C > 1.
Linear sketches under adaptive attack for Fp: require r = Ω(n) sketch dimension vs proposed: sublinear-space non-linear algorithm (exact sublinear exponent not given for all Fp in the excerpt).

Limitations

The paper is entirely theoretical in the provided text; there are no empirical measurements, real-world stream traces, or implementation benchmarks.
For the general theorem, the approximation loss is fairly coarse: the robust approximation factor becomes κ^{O(C)}/2^{O(C)}-type, depending on how one reads the notation, so the theorem is not a tight black-box preservation result.
The recursion and bounded-computation-paths argument are intricate; the excerpt does not fully expose all constants, so implementability from the text alone is nontrivial.
The F2 result is optimal only up to poly((log n)/ε) factors, not exactly optimal.
The theorem for symmetric functions assumes the function has an O(1)-approximate triangle inequality and a non-adaptive turnstile sketch; functions outside that class are not covered.
The proof relies on m = poly(n) and bounded update magnitudes; behavior for much longer streams or unbounded updates is not addressed here.

Open questions / follow-ons

Can the recursive estimator-corrector-learner idea be simplified into a more modular robust-sketch compiler with cleaner constants and fewer levels?
Can the adaptive-space overhead for the general symmetric-function theorem be improved from ˜O(n^{1/C}) toward polylog(n) for larger classes of functions?
Do there exist matching lower bounds for non-linear robust turnstile algorithms, analogous to the Ω(n) bounds known for linear sketches?
Can the approach be extended beyond approximate triangle inequality to other objectives, such as non-symmetric or non-metric statistics?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, the main lesson is not about streaming per se, but about how adaptive feedback breaks naïve randomized summaries. If a defense system emits outputs that attackers can query repeatedly, fixed internal randomness can become learnable, and the system may need mid-flight re-randomization or layered estimators to stay robust. The paper’s estimator-corrector-learner pattern is a useful mental model for any adaptive defense pipeline: keep a cheap estimator, add an independent verifier, and only spend expensive or sensitive state updates when the verifier says the attacker has started to exploit the system.

In CAPTCHA-adjacent settings, that suggests designing challenge selection, scoring, and telemetry aggregation so that no single summary mechanism is exposed to too many adaptive probes. The recursive block idea also maps to rate-limited or epoch-based defenses: periodically refresh the internal representation, and bound how much an adversary can learn before the representation changes. The paper is not proposing a CAPTCHA algorithm, but it is a strong theoretical example of why adaptivity forces you away from one-shot sketches and toward layered, resettable state.

Cite

bibtex

@article{arxiv2602_20854,
  title={ Adversarial Robustness on Insertion-Deletion Streams },
  author={ Elena Gribelyuk and Honghao Lin and David P. Woodruff and Huacheng Yu and Samson Zhou },
  journal={arXiv preprint arXiv:2602.20854},
  year={ 2026 },
  url={https://arxiv.org/abs/2602.20854}
}

Adversarial Robustness on Insertion-Deletion Streams ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​