Cascading Bandits Robust to Adversarial Corruptions

Source: arXiv:2502.08077 · Published 2025-02-12 · By Jize Xie, Cheng Chen, Zhiyong Wang, Shuai Li

TL;DR

This paper addresses the problem of online learning to rank under the cascade model in the presence of adversarial corruptions manipulating user feedback. The cascade model reflects user behavior where they scan a ranked list and click only the first attractive item. While cascading bandits algorithms have achieved logarithmic regret in stochastic environments, they are vulnerable to adversarial corruption such as click fraud, which degrades their performance. This work formulates the "Cascading Bandits with Adversarial Corruptions" (CBAC) problem, introducing an adaptive adversary that can corrupt click feedback. The authors propose two robust algorithms, CascadeRKC for known corruption level and CascadeRAC for agnostic (unknown) corruption levels. Both achieve logarithmic regret in the no-corruption regime, with regret increasing linearly in the corruption level. Experimental results on synthetic and three real-world datasets show their algorithms significantly outperform standard cascading bandits and adversarial baselines, maintaining robustness and low regret under various corruption intensities and strategies.

Key findings

CascadeRKC and CascadeRAC achieve gap-dependent logarithmic regret O(log T) when no adversarial corruption is present, improving over O(√T) adversarial baselines.
The cumulative regret of CascadeRKC and CascadeRAC increases linearly with the corruption level C, matching known lower bounds in MAB with corruptions.
CascadeRKC, which assumes known corruption level, outperforms CascadeRAC that is robust to agnostic corruption but has slightly higher regret bounds.
In synthetic experiments with L=500 items, K=5 positions, and T=1,000,000 rounds, CascadeRKC and CascadeRAC reduce cumulative regret by orders of magnitude compared to CascadeUCB1, UCBV, KL-UCB, and Ranked Bandits Algorithm (RBA).
Under different attraction probability gaps Δ (0.1, 0.2, 0.4), the proposed algorithms consistently outperform baselines and show regret decreasing as Δ increases, confirming theoretical bounds.
In real-world datasets (Movielens, Yelp, Yandex), CascadeRKC and CascadeRAC maintain significantly lower cumulative regret than baselines even with adversarial corruptions.
Robustness holds across different corruption mechanisms: periodic corruptions and early-phase concentrated corruptions.
PBE (position-based elimination) significantly improves sub-optimal item elimination rate over uniform elimination to control regret.

Threat model

The adversary is adaptive and can observe both the recommended list and the true user attraction feedback before choosing whether and how to corrupt it. The adversary cannot alter items outside the recommended list nor increase the total corruption beyond a bounded level C over the entire interaction horizon. Corruptions can flip attraction indicators causing false or missed clicks, aiming to degrade the learning algorithm's ability to find the optimal top-K ranked list.

Methodology — deep read

Threat model & assumptions: The adversary is adaptive and can modify user feedback after observing the recommended list and clicks, but total corruption magnitude is bounded by C. The user feedback follows the cascade model where users scan from top-ranked items and click the first attractive one. Corruption can flip attraction indicators up to C times cumulatively. The attraction indicator distributions are assumed independent Bernoulli per item. The agent aims to minimize cumulative regret versus the optimal fixed ranking.
Data: For experiments, synthetic data with L=500 uniformly distributed attraction probabilities in (0,0.5), K=5 recommended items per round, T=1M rounds. Real-world datasets include Movielens, Yelp, and Yandex with cascaded click feedback constructed from logged ratings/search data. Adversarial corruptions are injected via "periodic" and "early phase" corruption schedules flipping clicks on targeted items.
Algorithms: The key technical contribution is Position-Based Elimination (PBE), maintaining an elimination set M_k for each position k, allowing strict elimination of sub-optimal items by comparing empirical lower and upper confidence bounds (LCB and UCB). Confidence radii depend on item play counts. Two algorithms built on PBE:

CascadeRKC: Assumes corruption level C known, maintains two PBE instances (fast F and slow S) with different confidence radii. S instance is played with probability 1/C to dilute corruption effect.
CascadeRAC: Assumes corruption is unknown (agnostic), maintains O(log T) PBE instances with geometrically decreasing play probabilities and increasing confidence radii to accommodate worst-case corruptions. The elimination decisions are synchronized across instances to accelerate sub-optimal item pruning.

Training regime: No traditional training since this is an online bandit setting. Algorithms run for T rounds recommending K items, receiving feedback, updating empirical means and confidence bounds, and performing eliminations. Confidence radii scale as O(sqrt(log(T)/plays) + log(T)/plays). Seed and hyperparameters align with theoretical derivations.
Evaluation protocol: Regret is cumulative difference in reward between optimal and recommended list over rounds. Baselines include CascadeUCB1, CascadeUCBV, CascadeKL-UCB for stochastic cascading bandits, and Ranked Bandits Algorithm (RBA) for adversarial setting. Experiments average results over 10 independent runs. Multiple datasets and corruption mechanisms tested to evaluate robustness and regret scaling w.r.t corruption level and attraction gaps.
Reproducibility: Code release status unclear from paper text. Datasets include public and custom-processed versions of Movielens,Yelp,Yandex. Theoretical proofs and pseudocode algorithms included in appendix. Experimental settings and corruption injection protocols described sufficiently for replication.

Example end-to-end: For synthetic data with known C=50k corruption, CascadeRKC samples slow PBE instance with probability 1/C, restricting corruption impact. It eliminates sub-optimal items quickly via PBE across the K positions. Empirical means and confidence intervals updated per click observation, filtering corrupted clicks by large confidence radius in S instance. The algorithm converges on near-optimal ranking with regret scaling O(log T + C), outperforming baselines heavily affected by corruption.

Technical innovations

Formulation of Cascading Bandits with Adversarial Corruptions (CBAC), extending cascading bandits with an adaptive adversary corrupting feedback.
Position-Based Elimination (PBE): a novel per-position elimination strategy that enforces strict elimination rules to speed pruning of sub-optimal items in cascading bandits.
CascadeRKC: robust cascading bandit algorithm for known corruption levels using two PBE instances (fast and slow) sampled probabilistically to isolate corruptions.
CascadeRAC: agnostic corruption robust algorithm maintaining O(log T) PBE instances with geometrically decaying sampling probabilities and synchronized eliminations.

Datasets

Synthetic dataset — 500 items, 1,000,000 rounds — custom simulation
Movielens — varied size, processed for cascade model clicks — public
Yelp — public dataset, processed for cascade click simulation
Yandex — public dataset, processed for cascade click simulation

Baselines vs proposed

CascadeUCB1: cumulative regret ~2.33e4 (synthetic) vs CascadeRKC: ~2.5e3
CascadeUCBV: cumulative regret ~2.73e4 (synthetic) vs CascadeRAC: ~4.9e3
CascadeKL-UCB: cumulative regret ~2.9e4 (synthetic) vs CascadeRAC: ~4.9e3
RBA: cumulative regret ~3.13e4 (synthetic) vs CascadeRKC: ~2.5e3
On Movielens, CascadeKL-UCB: regret ~7.9e4 vs CascadeRKC: ~2.4e4
Regret scaling roughly linearly in corruption level C but lower slope for CascadeRKC and CascadeRAC.

Limitations

The algorithms require bounds or estimates of the corruption level; CascadeRAC mitigates this but at higher regret.
Theoretical lower bound for cascading bandits with corruption remains open, only MAB result known.
Evaluation focused on cascade click model; other click models (position-based, dependent clicks) not studied.
Assumption of independence of attraction probabilities and simplified Bernoulli model may not capture complex real user behavior.
Code and exact replication details not fully provided, hindering immediate reproducibility.
No adversarial or adaptive attack evaluation beyond bounded corruption assumed; worst-case attacks could behave differently.

Open questions / follow-ons

What are tight lower bounds for regret in cascading bandits with adversarial corruption?
Can these robust elimination concepts be extended to other user click models beyond the cascade model, such as position-biased or dependent clicks?
How do the algorithms perform against stronger adversaries or unknown adaptive attack strategies that do not obey total corruption bounds?
Can similar robustness guarantees be obtained with more efficient algorithms reducing computational or memory overhead?

Why it matters for bot defense

The study of cascading bandits robust to adversarial corruptions is highly relevant to bot-defense and CAPTCHA practitioners because it models practical scenarios where adversaries attempt to manipulate feedback signals (e.g., clicks) to degrade the learning of optimal item rankings or user interactions. Similar to click fraud, automated bots or CAPTCHAs may try to exploit ranking or interaction models to bias outcomes or evade detection. The position-based elimination approach and multi-instance robustness techniques presented here provide principled mechanisms to mitigate the influence of corrupted feedback in online recommendation or challenge-response tasks where user behavior is sequential and partially observed. Deploying robust ranking algorithms that gracefully degrade under adversarial manipulation can improve security and user experience in systems relying on online interaction data for intent recognition or bot screening. Furthermore, the linear scaling of regret with corruption level highlights that limiting adversarial volume is crucial, aligning with real-world practices of throttling or anomaly detection to constrain attack impact.

Cite

bibtex

@article{arxiv2502_08077,
  title={ Cascading Bandits Robust to Adversarial Corruptions },
  author={ Jize Xie and Cheng Chen and Zhiyong Wang and Shuai Li },
  journal={arXiv preprint arXiv:2502.08077},
  year={ 2025 },
  url={https://arxiv.org/abs/2502.08077}
}

Cascading Bandits Robust to Adversarial Corruptions ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​