Non-Intrusive Graph-Based Bot Detection for E-Commerce Using Inductive Graph Neural Networks
Source: arXiv:2601.22579 · Published 2026-01-30 · By Sichen Zhao, Zhiming Xue, Yalun Qi, Xianling Zeng, Zihan Yu
TL;DR
This paper addresses the challenge of detecting sophisticated malicious bots on e-commerce platforms, which evade traditional defenses like IP blacklists and CAPTCHAs by using proxies, botnets, or AI-assisted evasion. The authors propose a non-intrusive bot detection approach that models user sessions and their accessed URLs as a bipartite graph, leveraging an inductive graph neural network (GraphSAGE) to classify sessions as bot or human. By combining graph topology with lightweight behavioral and URL semantic features, the method captures subtle patterns of automated activity that feature-based models often miss.
The approach was evaluated on real-world e-commerce traffic containing about 80,000 sessions with high-confidence bot labels derived from honeypots and controlled injections. Results show the refined GraphSAGE model significantly outperforms a strong session-feature MLP baseline, achieving AUC of 0.9705 vs. 0.9102 and substantially improved recall at low false positive rates. The method remains robust under adversarial perturbations of session–URL edges and cold-start scenarios involving previously unseen sessions and URLs, demonstrating key advantages for practical deployment without any client-side instrumentation or friction-inducing challenges. The model supports real-time inference with efficient neighbor sampling and incremental updates, making it suitable as a backend plug-in for e-commerce security.
Key findings
- GraphSAGE on the refined session–URL bipartite graph achieves mean AUC 0.9705 ± 0.0085 versus 0.9102 ± 0.0150 for a session-feature MLP baseline on real e-commerce traffic (∼80K sessions, 5% bots).
- Recall at ~1% false positive rate improves from 0.7510 (MLP) to 0.9002 (GraphSAGE refined graph), with F1 rising from 0.7508 to 0.8501.
- Graph refinement by filtering static resource hubs (e.g., CSS) reduces performance variance and stabilizes detection results (Fig 5).
- Adversarial graph perturbations (modifying 1–3 session–URL edges per bot session) degrade AUC mildly from 0.971 to 0.958, while the MLP degrades faster, showing GraphSAGE robustness to moderate evasive edge changes (Fig 6).
- Cold-start inductive inference on unseen sessions/URLs (Week 2) causes only a 0.8% AUC drop (0.9705 → 0.9630), compared to 6.6% drop for MLP baseline (0.9100 → 0.8500), indicating strong generalization.
- On a hard subset with sessions visiting only unseen URLs, GraphSAGE AUC drops 7.7% to 0.8890, while MLP drops 15.2% to 0.7210, demonstrating gains from relational embedding even with no URL neighborhood overlap.
- GraphSAGE improves the most on short-to-medium length sessions (3–50 URL visits), with AUC gains of 0.108–0.140 versus MLP; performance is limited on very short (❤️) sessions due to sparse context.
- Ablation shows combined topology and semantic features yield best results; a structure-only GNN still beats a features-only MLP (0.88 vs 0.85 AUC), underscoring the importance of graph structure.
Threat model
The adversary consists of automated malicious bots that mimic human browsing patterns on e-commerce platforms to scrape data, hoard inventory, or perpetrate fraud. They may use proxies, botnets, and modify browsing behaviors including accessed URLs to evade detection. However, adversaries cannot fully disguise graph relational patterns without significantly affecting their attack objectives. The defender only observes backend logs (sessions and URLs) with no client-side instrumentation and cannot rely on IP or user-agent for reliable detection.
Methodology — deep read
Threat Model & Assumptions: The adversary is a malicious bot operator using automated scripts, proxies, or botnets to scrape data, hoard inventory, or commit fraud that mimics normal user browsing behavior. The adversary attempts to evade detection by appearing feature-normal or perturbing graph connectivity. The defender has access only to backend server logs capturing session requests and URLs, but not client-side instrumentation or IP truth. The adversary cannot fully obfuscate graph relational signals without heavy loss of effectiveness.
Data: The data consists of anonymized HTTP server logs from a mid-sized e-commerce platform over two weeks, filtered to remove sessions with fewer than 2 requests and extreme outliers. The final dataset contains around 80,000 sessions with approximately 5% labeled bots. Bot labels come from a hybrid strategy combining real-world trick traps (trap URLs, honeypots) and injected controlled bot scripts to provide high-confidence training and evaluation samples. 10% of the data is held out for validation and test with time-based splits to mimic deployment.
Architecture / Algorithm: Sessions and URLs form nodes in a bipartite graph: edges exist if a session accessed a URL. Session node features include session duration, request counts, visit depth, and coarse user-agent info. URL nodes have privacy-preserving semantic features like page category and relative popularity but no raw URL or query data. The graph is refined by filtering noisy static-resource hubs before applying GraphSAGE, an inductive graph neural network. GraphSAGE uses two layers of neighbor aggregation with mean pooling and ReLU non-linearity, producing 128-dimensional embeddings. Sessions are classified by a 2-layer MLP applied to their embeddings to estimate bot probability.
Training Regime: Supervised training minimizes weighted binary cross-entropy using mini-batches of sessions and their k-hop neighborhoods from the graph. Adam optimizer with learning rate 0.001 and early stopping on validation AUC is used. Data imbalance is addressed by class weighting and resampling. Dropout and L2 regularization prevent overfitting. 5 different random seeds are run for stability. MLP baseline uses same session features without graph info.
Evaluation Protocol: Metrics include AUC, precision, recall at ~1% false positive rate, and F1 score. Testing simulates operational deployment by temporally splitting train and test data. Adversarial perturbation experiments simulate edge additions/removals to test evasion robustness. Cold-start evaluation scores sessions and URLs unseen at training time without retraining, testing inductive generalization. Additional ablations compare graph topology only, semantics only, and full model.
Reproducibility: The paper does not indicate release of code or data. The dataset is proprietary anonymized e-commerce logs with semi-synthetic labels and is not publicly available. Hyperparameter details, model architectures, and training setups are sufficiently described for conceptual reproduction but lack full open-source reproducibility.
Concrete example: For a new session S visiting URLs U1, U2, and U3, the system maps S and URLs to nodes, constructs edges for visits, extracts session and URL features excluding raw strings, then uses GraphSAGE to aggregate neighbor embeddings from the local 2-hop subgraph. The resulting embedding of S is fed into the MLP classifier to produce a bot probability score usable in real-time scoring pipelines.
Technical innovations
- Formulating e-commerce bot detection as a bipartite session-URL graph enables relational context beyond standard per-session features.
- Use of an inductive GraphSAGE model supports real-time scoring of previously unseen sessions and URLs without retraining.
- Graph refinement filtering static resource hubs improves stability and accuracy by reducing noisy, highly connected nodes.
- Combining lightweight, privacy-preserving URL semantic features with session behavior in the graph improves detection of feature-normal bots.
- Robustness evaluation simulating adversarial perturbations on graph edges demonstrates resilience to moderate evasive edge modifications.
Datasets
- Anonymized mid-sized e-commerce server logs — ~80,000 sessions (∼5% bots) over 2 weeks — proprietary, not public
Baselines vs proposed
- Session-feature MLP baseline: AUC = 0.9102 ± 0.0150 vs GraphSAGE (refined graph): AUC = 0.9705 ± 0.0085
- Session-feature MLP baseline: Recall @1% FPR = 0.7510 vs GraphSAGE (refined): Recall = 0.9002
- GraphSAGE raw graph: AUC = 0.8756 ± 0.1042 underperforms refined graph variant
- Adversarial perturbation at 2 edges/session: GraphSAGE AUC drops from 0.971 to 0.958, MLP baseline drops more sharply (value not specified exactly)
- Cold-start Week 2 test set: MLP AUC = 0.8500 vs GraphSAGE inductive = 0.9630
- Extreme unseen URL subset: MLP AUC = 0.7210 vs GraphSAGE = 0.8890
Limitations
- Dataset is proprietary, anonymized, and partly semi-synthetic, limiting reproducibility and external validation.
- No evaluation against highly coordinated or sophisticated adversaries who mimic benign graph structure at scale; heavy evasive strategies remain challenging.
- Temporal dynamics are modeled implicitly via static graphs; explicit temporal GNNs or trajectory models are not evaluated.
- Cold-start inductive performance degrades when sessions access mostly unseen URLs, reflecting sparse neighborhood contexts.
- Adversarial perturbations simulate only limited edge additions/removals; more complex attack strategies (e.g., node injections or feature manipulation) are not studied.
- Deployment considerations are discussed, but live A/B testing or operational impact metrics are absent.
Open questions / follow-ons
- How would the model perform against highly coordinated botnets that distribute their activity to maintain benign session–URL connectivity patterns?
- Can temporal graph neural networks or trajectory-based models improve detection of bots that exploit temporal sequences of page visits?
- How effective are explicit defenses or robust training methods against adaptive adversarial manipulations beyond simple edge perturbations?
- What extensions or additional node types (e.g., account, IP) in the graph improve detection granularity and attribution capabilities?
Why it matters for bot defense
This paper offers a scalable, backend-only bot detection framework that bypasses the need for intrusive challenges like CAPTCHAs, which degrade user experience and are increasingly bypassed by modern bots. Bot-defense engineers can leverage the session–URL interaction graph and inductive GNN approach to capture subtle relational signals of automation missed by feature-only models, improving detection accuracy particularly for feature-normal bots. The inductive nature supports cold-start detection of new sessions and URLs, enabling continuous deployment without client-side changes or user friction.
However, the approach still requires robust labeled data and careful graph construction filtering noisy hubs to maximize performance. The framework complements existing detection and mitigation pipelines by providing a realtime risk score per session with explainable local graph context, which can guide more targeted bot mitigation strategies beyond blunt CAPTCHAs. Future enhancements could integrate more heterogeneous node types and adversarial robustness for evolving bot sophistication.
Cite
@article{arxiv2601_22579,
title={ Non-Intrusive Graph-Based Bot Detection for E-Commerce Using Inductive Graph Neural Networks },
author={ Sichen Zhao and Zhiming Xue and Yalun Qi and Xianling Zeng and Zihan Yu },
journal={arXiv preprint arXiv:2601.22579},
year={ 2026 },
url={https://arxiv.org/abs/2601.22579}
}