Optimal Transport-Guided Adversarial Attacks on Graph Neural Network-Based Bot Detection

Source: arXiv:2602.00318 · Published 2026-01-30 · By Kunal Mukherjee, Zulfikar Alom, Tran Gia Bao Ngo, Cuneyt Gurcan Akcora, Murat Kantarcioglu

TL;DR

BOCLOAK is a constraint-aware adversarial attack framework for graph neural network (GNN) bot detectors on social networks. The paper’s core claim is that many prior graph attacks are too unconstrained to reflect how real bots actually operate: attackers can usually choose whom their bot follows, but cannot force humans to follow back, cannot arbitrarily rewire the rest of the graph, and are limited by partial observability, temporal feasibility, and API-scale search costs. BOCLOAK reframes the problem at the neighborhood-distribution level rather than the individual-edge level, using optimal transport (OT) to compare a target bot’s local ego network to that of a “cloak” template and then decoding the transport plan into sparse, plausible edge edits.

The main result is that this OT-guided geometry yields much stronger evasion than standard graph attack baselines under realistic constraints, while also being cheaper to run. Across three datasets, five bot detectors, and three defenses, BOCLOAK reports up to 80.13% higher attack success rates and up to 99.80% lower GPU memory use than prior attacks such as PR-BCD and FGA. The paper’s experiments show a sharp gap between unconstrained and constrained settings for classic attacks, whereas BOCLOAK retains high misclassification rates even when node edits must obey social-domain rules and temporal plausibility.

Key findings

On node editing with budget Δ = 1, BOCLOAK reaches 93.10% misclassification on GAT for Cresci-15, versus 8.17% for Nettack, 8.52% for FGA, 8.79% for PR-BCD, and 7.61% for GOttack under the same constrained setting.
On TwiBot-22 node editing with BotRGCN, BOCLOAK achieves 94.38% misclassification, compared with 13.51% (Nettack), 12.32% (FGA), 13.51% (PR-BCD), and 11.41% (GOttack) under the constrained setting in Table 2.
On BotSim-24 node editing with GAT, BOCLOAK obtains 88.74% misclassification versus 5.52% for Nettack, 5.79% for FGA, 7.32% for PR-BCD, and 7.08% for GOttack under constrained attacks.
The paper reports up to 80.13% higher attack success rates for BOCLOAK overall relative to the compared graph attack baselines, though the exact dataset/model slice for this maximum is not specified in the excerpted text.
BOCLOAK uses up to 99.80% less GPU memory than PR-BCD and FGA, and up to 20× faster runtime than Nettack and GOttack, according to the authors’ summary claims.
In constrained settings, standard baselines collapse dramatically: on Cresci-15 against BotRGCN, Nettack drops to 4.32% misclassification and GOttack to 3.55%, while BOCLOAK remains at 99.34% on the same table slice.
The paper states that even a single human follower can mislead a SOTA bot detector in 81.50% of cases (Figure 1 discussion), indicating extreme sensitivity to local neighborhood structure.

Threat model

The adversary is a test-time, targeted evasion attacker operating under black-box assumptions. They know the training graph, labels, and feature schema, but not the victim model’s parameters, gradients, logits, or architecture, and they cannot query the model directly. They may add a new bot node or edit an existing bot node by changing only incident edges, with outgoing follows under their control but incoming follow-backs not forceable. They cannot modify other nodes, rewire unrelated edges, or alter other users’ features, labels, or metadata; poisoning and global availability attacks are explicitly out of scope.

Methodology — deep read

BOCLOAK is evaluated as a test-time evasion attack, not a poisoning attack. The adversary is assumed to have black-box access to the deployed detector’s training graph, labels, and feature schema, but not its architecture, parameters, gradients, logits, or query access. The attacker can introduce a new bot node or edit an existing bot node, but only by changing edges incident to that node; they cannot alter other nodes, cannot force arbitrary humans to follow them, and cannot modify the rest of the graph. The paper emphasizes realistic social-platform constraints: outgoing follows are controllable, incoming follow-backs are not, and edits must remain temporally and behaviorally plausible.

The data used are three social bot detection benchmarks. Cresci-15 has 5,301 nodes and 14,220 edges, with 3,351 bots and 1,950 humans. TwiBot-22 is much larger, with 1,000,000 nodes, 170,185,937 edges, 139,943 bots, and 860,057 humans. BotSim-24 has 2,907 nodes, 46,518 edges, 1,000 bots, and 1,907 humans. The paper says it follows the official train/validation/test splits from the dataset authors and uses the original training protocols for the victim detectors, including early stopping. BOCLOAK uses 1-hop ego neighborhoods, which the authors justify as standard for capturing behavioral signals in this domain. The excerpt does not give exact preprocessing details for the neighborhood feature vectors beyond noting that they combine profile, content, relational, structural, and temporal signals.

Architecturally, BOCLOAK first represents each node’s local neighborhood as a probability measure over feature vectors. For a node v, its neighborhood measure is μ_v = Σ_{η∈N(v)} w_v(η) δ_{ϕ_v(η)}, where ϕ_v(η) is a fixed-dimensional vector for a neighbor η and w_v(η) is a normalized importance weight. The importance score a_v(η) is defined as the product of a structural term and a temporal term, a_v(η)=g_str(s(η)) g_temp(t(η)), with monotone functions chosen to avoid over-concentrating mass on a few neighbors. The OT ground cost is learned via an embedding h_θ and a PSD matrix M, using c_θ(z,z') = ||h_θ(z)-h_θ(z')||^2_M. The paper then computes an entropy-regularized OT distance between neighborhood measures using Sinkhorn iterations. A contrastive objective trains θ so that same-label neighborhoods are close and different-label neighborhoods are separated by a margin γ. The excerpt does not specify the exact optimizer, learning rate, batch size, or number of epochs for this OT-training stage.

A key novelty is the “boundary cloak” idea. For each bot node v, the method defines d_hum(v) = min_{h∈V_hum} D_θ(v,h) and d_bot(v) = min_{ξ∈V_bot{v}} D_θ(v,ξ), then sets the OT margin m(v)=d_hum(v)-d_bot(v). Bots with small or negative margin are close to the human region. The candidate cloak set is B_cand = B_bdry ∩ B_mis, where B_bdry contains bots with margin below a validation threshold τ_bdry and B_mis contains bots currently misclassified as human by the victim model. These misclassified bots are used as templates because they already lie near or across the detector boundary. The paper says that when no such bot cloak exists, the method falls back to a nearby human in OT space. The excerpt does not provide the full details of the threshold selection procedure beyond noting that τ_bdry is validation-set tuned.

Attack generation is then framed as a constrained optimization problem over edge edits incident to the target bot vt. The target objective is min_{ΔE∈F(B)} min_{v_c∈B_cand} D_θ(μ_t(ΔE), μ_c) + λ_pl Φ(ΔE) + λ_sp |ΔE|, where F(B) enforces an edge budget and hard feasibility constraints, Φ penalizes implausible edits, and |ΔE| encourages sparsity. Because directly optimizing over discrete edge sets is intractable, BOCLOAK uses a two-stage approximation: (1) learn the OT geometry offline; (2) identify candidate cloaks near the decision boundary; and (3) decode a transport plan into a concrete set of edges. The decoding step compares the selected cloak to a nearby human neighbor in OT space, computes an entropic OT plan between their neighborhoods, and extracts high-mass correspondences to determine which edges to add, delete, or preserve. For node injection, only edge additions are allowed; for node editing, both additions and deletions are possible, but not human follow-backs. The complexity of one Sinkhorn solve is stated as O(T_sink m n) time and O(mn) memory for neighborhood sizes m and n.

Evaluation uses misclassification rate as the primary metric: among correctly classified target bots V_t, the attack succeeds when f(G', v_t)=human while f(G, v_t)=bot, and the score is the fraction of flipped targets. The main comparisons are against Nettack, FGA, PR-BCD, and GOttack, plus random baselines for node injection because the authors say no prior social-graph node-injection attack exists. Victim models include GAT, BotRGCN, and S-HGN, with defended variants GNNGuard, GRAND, and RobustGCN for node editing experiments. The paper evaluates both unconstrained and constrained attacks, and Table 2 reports results for 50 correctly classified bots under a budget Δ=1. The experiments were run on a Linux cluster with an AMD EPYC 7313 CPU, 1 TiB RAM, and 8 NVIDIA L40S 46 GB GPUs. The excerpt states a code release at an anonymous 4open.science repository, but does not mention frozen weights, a public dataset release by the authors, or seed strategy. A concrete example from Table 2: on TwiBot-22 against BotRGCN under constrained node editing, Nettack achieves 9.33% misclassification while BOCLOAK achieves 86.67%; the same table shows that under unconstrained conditions, many baselines are much stronger than in constrained settings, which supports the paper’s realism argument.

Technical innovations

Recasts bot evasion as optimal transport between spatio-temporal neighborhood distributions instead of direct edge-flip search.
Learns a label-aware OT geometry with a contrastive margin loss over same-class and different-class node pairs.
Introduces boundary bot cloaks: misclassified or boundary bots used as templates for constructing plausible evasive neighborhoods.
Decodes entropic OT transport plans into discrete, budgeted edge edits that respect directionality and temporal constraints.
Applies OT-guided attacks to both node injection and node editing on social bot detectors, which the authors describe as the first OT-based evaluation in this setting.

Datasets

Cresci-15 — 5,301 nodes, 14,220 edges, 3,351 bots, 1,950 humans — public benchmark
TwiBot-22 — 1,000,000 nodes, 170,185,937 edges, 139,943 bots, 860,057 humans — public benchmark
BotSim-24 — 2,907 nodes, 46,518 edges, 1,000 bots, 1,907 humans — public benchmark

Baselines vs proposed

Cresci-15 / GAT / constrained node editing: Nettack misclassification = 8.17% vs proposed = 93.10%
Cresci-15 / BotRGCN / constrained node editing: FGA misclassification = 3.95% vs proposed = 99.34%
TwiBot-22 / BotRGCN / constrained node editing: GOttack misclassification = 6.89% vs proposed = 86.67%
BotSim-24 / GAT / constrained node editing: PR-BCD misclassification = 7.32% vs proposed = 88.74%
BotSim-24 / BotRGCN / constrained node editing: Nettack misclassification = 56.57% vs proposed = 58.22%
Across datasets: PR-BCD and FGA GPU memory = baseline vs proposed = up to 99.80% lower memory usage (reported summary claim)
Across datasets: Nettack and GOttack runtime = baseline vs proposed = up to 20× faster runtime (reported summary claim)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2602.00318.

Fig 2

Fig 2: Overview of BOCLOAK. The method learns an optimal transport geometry over k-hop neighborhood distributions and uses

Limitations

The excerpt does not provide the exact hyperparameters, optimizer, epoch count, or random seed strategy for the OT training stage, limiting reproducibility detail.
The strongest quantitative claims (80.13% higher ASR, 99.80% less memory, 20× faster) are summarized without the full per-dataset breakdown in the excerpt, so the precise operating points are unclear here.
The attack assumes access to the training graph, labels, and feature schema; this is black-box with side information, but not a fully information-starved attacker.
The domain plausibility constraints are described conceptually, but the exact implementation of Ψ(ΔE) / Φ(ΔE) is not fully visible in the excerpt.
The evaluation focuses on targeted evasion of correctly classified bots; it does not address broader platform-level detection, active investigation, or post-detection remediation.
No distribution-shift test is shown in the excerpt beyond using multiple datasets; it remains unclear how well the learned OT geometry transfers across time or platform changes.

Open questions / follow-ons

How much of the learned OT geometry transfers across time when bot behavior drifts, especially if the detector is retrained on newer social graphs?
Can the boundary-cloak idea be turned into a defensive diagnostic to identify locally fragile bot detectors before deployment?
How sensitive are the results to the choice of 1-hop neighborhoods versus deeper ego networks or heterogeneous relation types?
Would the approach still work under stricter API limits where the attacker cannot access the full training graph or reliable neighborhood statistics?

Why it matters for bot defense

For a bot-defense engineer, the main takeaway is that neighborhood structure alone can be manipulated in a way that preserves local plausibility while flipping GNN-based bot decisions. That means defenses that rely on graph topology should be stress-tested under realistic attacker constraints, not just under unconstrained edge-flip budgets. The OT framing is especially relevant because it gives a concrete way to generate hard cases: instead of random perturbations, the attacker learns which local structural and temporal patterns make a bot look human-like to the detector.

Operationally, this suggests two responses. First, evaluation should include constraint-aware attack suites that forbid impossible actions such as arbitrary human follow-backs. Second, detectors may need adversarial training or robustness checks that account for neighborhood-distribution shifts, not just node features or raw adjacency. The paper’s results also imply that a single “human-looking” neighboring account can disproportionately affect detection, so human-review workflows and downstream moderation systems should not treat graph-based scores as stable in isolation.

Cite

bibtex

@article{arxiv2602_00318,
  title={ Optimal Transport-Guided Adversarial Attacks on Graph Neural Network-Based Bot Detection },
  author={ Kunal Mukherjee and Zulfikar Alom and Tran Gia Bao Ngo and Cuneyt Gurcan Akcora and Murat Kantarcioglu },
  journal={arXiv preprint arXiv:2602.00318},
  year={ 2026 },
  url={https://arxiv.org/abs/2602.00318}
}

Optimal Transport-Guided Adversarial Attacks on Graph Neural Network-Based Bot Detection ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​