SAHG — Sector-Anisotropic Hyperbolic Graph Model for Social Bot Detection

Source: arXiv:2605.30166 · Published 2026-05-28 · By Hanning Lu, Yingguang Yang, Jinwei Su, Yang Liu, Zhaoqian Yao, Yaoming Li et al.

TL;DR

This paper addresses challenges in social bot detection on modern social networks, where large language model (LLM)-driven bots produce realistic text, undermining content-based detection. Instead, relational patterns in social graphs—such as coordinated behavior, interactions, and community structure—provide important signals. Traditional graph neural networks (GNNs) operating in Euclidean space distort hierarchical and scale-free social network topology. Fixed-curvature hyperbolic models improve representation but fail to adapt geometric resolution directionally, limiting discrimination among heterogeneous structures. Additionally, sophisticated bots create heterophilic links to genuine users, causing neighborhood aggregation in GNNs to mix bot and human signals, diluting detection evidence at the node level. To overcome these, the authors propose SAHG (Sector-Anisotropic Hyperbolic Graph), which learns a direction-dependent curvature field that adapts geometric resolution variously across different structural directions in the latent space. SAHG also introduces sector prototypes converting angular latent representations into classifier-readable features of concentration and alignment. Crucially, it employs dual independent encoding channels for per-account features and graph neighborhood representation, fusing only at classification time to prevent contamination of account-level evidence by noisy neighborhood aggregation.

Empirically, SAHG significantly outperforms 13 strong baselines including feature-based, LLM-based, Euclidean graph, and isotropic hyperbolic models on three datasets—Fox8-23, BotSim-24, and MGTAB—achieving best accuracy and F1 scores. Ablation confirms the importance of anisotropic curvature, sector prototypes, and the dual-channel architecture. Geometric analyses show that direction-dependent curvature focuses resolution in bot-dominant latent directions, improving class separation. The approach demonstrates robustness to label scarcity and hyperparameter variation. SAHG represents a novel and effective blend of adaptive hyperbolic geometry and robust dual evidence fusion for social bot detection amid LLM-era challenges.

Key findings

SAHG achieves highest accuracy/F1 on Fox8-23 (ACC=99.32%, F1=99.32%), BotSim-24 (ACC=99.47%, F1=99.41%), and MGTAB (ACC=91.51%, F1=89.09%) vs. all 13 baselines.
Removing the hyperbolic encoder drops MGTAB accuracy from 91.51% to 90.86% and F1 from 89.09% to 88.38%, showing geometric encoding importance.
Dual-channel design improves MGTAB ACC by ~0.5% and F1 by ~0.6% compared to no graph channel ablation.
Direction-dependent curvature γ(u) clusters bots in compact, high-curvature latent sectors (>1.1), while humans are broadly dispersed with lower curvature.
Sector prototype angular concentration/entropy features provide sharp bot-human separation, with bot entropy near zero and human entropy up to 0.7.
Performance stable across number of sector prototypes K={1,2,4,8}, with best results near K=2 or 4.
Label efficiency: SAHG reaches near-peak F1 with only 20% labels on Fox8-23 and BotSim-24.
SAHG outperforms isotropic fixed-curvature hyperbolic baseline HNN-Poincaré on MGTAB by +0.5% ACC and +0.6% F1.

Threat model

The adversary consists of operators controlling social bot accounts capable of generating fluent, human-like text and strategically forming heterophilic links with genuine users to obscure graph-based detection. They do not have direct influence over the detection model or training data but attempt to evade detection by blending structure and interaction patterns. The defender has access to feature and graph data with ground-truth labels for training, aiming to robustly separate bots from humans despite adaptive bot camouflage.

Methodology — deep read

Threat Model & Assumptions: The adversary controls or influences social bot accounts that produce fluent, human-like content via LLMs, reducing efficacy of lexical/textual classifiers. Bots may also camouflage by forming heterophilic connections with genuine users to obfuscate graph structure. The defender knows ground-truth bot labels for training data and has access to account features and social graph edges or k-NN graphs constructed from features. The adversary cannot fully hide coordination signatures or prevent graph construction.
Data: The authors evaluate on three datasets—Fox8-23 (2,280 accounts balanced bots/humans, 31-dim features, no real graph edges, k-NN graph used), BotSim-24 (2,907 accounts, 1,000 bots, 17-dim features, no real edges, k-NN), and MGTAB (10,199 accounts, 2,748 bots, 788-dim features with a real heterogeneous social graph of 7 edge types). All datasets have binary bot/human labels.
Architecture/Algorithm: SAHG maps input node features into a latent hyperbolic space represented polar coordinates (radial magnitude r, angular direction u). Unlike prior hyperbolic GNNs with fixed curvature, SAHG learns a direction-dependent curvature field γ(u) via a small MLP (LOCALWARPNET) to adapt geometric resolution across angular latent dimensions. This enables better separation of bot/human clusters with heterogeneous densities/structures. To convert continuous latent directions into classifier-readable features, SAHG trains a set of learnable sector prototypes {p_k}, soft-assigning nodes to sectors modulated by γ(u), producing features representing angular concentration and alignment (entropy, max alignment). To defend against heterophilic bot-human edges corrupting neighborhood aggregation, SAHG uses dual independent encoding channels: (1) node channel encoding raw per-account features by SAH encoder; (2) graph channel encoding a two-hop aggregated neighborhood representation constructed using GraphSAGE message passing in Euclidean space then passed through the SAH encoder, each with separate parameters. These outputs are concatenated and fed to an MLP classifier.
Training regime: The full model is trained with a focal loss to handle label imbalance, with class weighting and focusing parameter γ_focal. An additional warm-up entropy regularizer encourages meaningful sector formation on bot nodes early in training, decaying over a set warm-up period. Training details involve standard batch sizes, layer normalization, GELU activation, and learned linear transformations. Evaluations report averages over seeds {0,1,2}.
Evaluation protocol: Performance measurement used Accuracy, F1 score, and Recall. Baselines include 13 methods across feature-based, graph neural, LLM-based encoders, and hyperbolic embeddings. Comparisons control for graph construction (cosine k-NN where applicable). Ablations isolate effects of hyperbolic encoding, sector prototypes, graph channel, and direction-dependent curvature. Sensitivity tests on number of sectors K and label fraction test robustness.
Reproducibility: Source code is provided at https://github.com/lhnjames/SAHG. The datasets Fox8-23 and BotSim-24 are publicly referenced; MGTAB details are cited but may not be fully open. Model weights and hyperparameters appear fully described in appendices.

Example pipeline: For a given account node, raw features xi are encoded into latent vector zi through MLP and decomposed into radius ri and direction ui. The direction ui is passed through LOCALWARPNET predicting γ(ui), providing a direction-dependent curvature scalar, modulating angular resolution. SectorPrototypes compute soft memberships q_ik, entropy H_i, and alignment A_i, forming a 5D SAH feature vector. Parallel graph channel aggregates two-hop neighbors via GraphSAGE and encodes similarly. The concatenated node and graph SAH features are input to the classifier producing the bot probability.

Technical innovations

Introduction of a direction-dependent curvature field γ(u) in hyperbolic space enabling anisotropic geometric resolution tailored to heterogeneous social graph structures.
Design of sector prototypes that convert continuous angular latent directions into classifier-readable, concentration and alignment features modulated by local curvature.
Dual independent SAH encoding channels separating per-account feature encoding from neighborhood aggregation to mitigate contamination from bot-human heterophilic edges.
Integration of focal loss with an entropy regularizer targeting bot nodes to encourage early meaningful sector formation in latent space.

Datasets

Fox8-23 — 2,280 accounts (1,140 bots, 1,140 humans), 31-dim features — public
BotSim-24 — 2,907 accounts (1,000 bots, 1,907 humans), 17-dim features — public
MGTAB — 10,199 accounts (2,748 bots, 7,451 humans), 788-dim features — real heterogeneous social graph with 7 relation types, source cited

Baselines vs proposed

Mou et al.: Fox8-23 ACC=62.77% vs SAHG ACC=99.32%
Arin et al.: Fox8-23 ACC=96.59% vs SAHG ACC=99.32%
BotRGCN: MGTAB ACC=90.60% vs SAHG ACC=91.51%
RGT: BotSim-24 ACC=99.24% vs SAHG ACC=99.47%
HGCN (learnable curvature): MGTAB ACC=80.44% vs SAHG ACC=91.51%
HNN-Poincaré (fixed isotropic curvature): MGTAB ACC=90.99% vs SAHG ACC=91.51%
RoBERTa: Fox8-23 ACC=50.00% vs SAHG ACC=99.32%
Ablation (w/o Hyperbolic): MGTAB ACC=90.86% vs full SAHG ACC=91.51%

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.30166.

Fig 3: t-SNE visualization on Fox8-23 for (a) CACL, (b) HNN-Poincaré, and (c) SAHG. SAHG

Fig 4: Direction-dependent curvature γ(u) learned by SAHG on the Poincaré disk: (a) MGTAB

Fig 5: SAH geometric quantity distributions on Fox8-23 (top: node channel; bottom: graph

Fig 6: (a) K sensitivity on BotSim-24, (b) K sensitivity on Fox8-23, (c) Label efficiency on

Fig 5 (page 9).

Fig 6 (page 9).

Fig 7 (page 9).

Fig 8 (page 9).

Limitations

No explicit adversarial evaluation against adaptive bots actively trying to manipulate graph structure.
Limited dynamic or temporal modeling of social graphs; all experiments on static snapshots.
While three datasets are used, only one (MGTAB) contains real social graph edges; others use k-NN feature graphs.
No cross-dataset or distribution shift generalization tests reported.
Sector prototype interpretability and optimal number (K) selection may require tuning per dataset.
Graph aggregation uses Euclidean space for neighborhood representation before hyperbolic encoding, which may introduce distortions.

Open questions / follow-ons

How does SAHG perform against adaptive adversaries explicitly optimizing to mimic human graph neighborhoods?
Can the direction-dependent curvature and sector prototype approach be extended to dynamic or temporal graphs capturing evolving bot campaigns?
What is the optimal granularity and interpretability of sector prototypes in varying social network settings?
How might joint end-to-end hyperbolic aggregation (instead of Euclidean GraphSAGE) affect robustness and performance?

Why it matters for bot defense

For bot-defense engineers, SAHG offers a principled approach to improve detection robustness when textual signals alone falter in the face of fluent LLM-generated content. Its novel incorporation of adaptive hyperbolic geometry enables better differentiation of hierarchical and scale-free social structures common in large-scale social graphs. The dual-channel design specifically addresses risk from intentional bot-human interaction camouflage, preserving strong per-account signals while leveraging neighborhood context. Such geometric and dual-path models can complement traditional bot detection systems or CAPTCHAs by providing higher-level relational features that are difficult for sophisticated bots to mimic.

However, practical deployment requires considering the nature of available graph data and computational overhead from hyperbolic encoding and prototype learning. Since social graphs frequently evolve, adapting SAHG to dynamic scenarios is a compelling next step. Also, calibrating sector prototypes for varying graph scales and bot densities will be essential. Overall, SAHG advances bot detection beyond standard Euclidean GNNs towards a richer geometric understanding, informing CAPTCHA and bot-mitigation system designs that integrate network structure as a complementary signal.

Cite

bibtex

@article{arxiv2605_30166,
  title={ SAHG: Sector-Anisotropic Hyperbolic Graph Model for Social Bot Detection },
  author={ Hanning Lu and Yingguang Yang and Jinwei Su and Yang Liu and Zhaoqian Yao and Yaoming Li and Taoran Liang and Ziyi Zhang and Ran Ran and Kefu Xu and Bin Chong },
  journal={arXiv preprint arXiv:2605.30166},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.30166}
}

SAHG: Sector-Anisotropic Hyperbolic Graph Model for Social Bot Detection ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​

SAHG: Sector-Anisotropic Hyperbolic Graph Model for Social Bot Detection