Skip to content

TRACE-Bot: Detecting Emerging LLM-Driven Social Bots via Implicit Semantic Representations and AIGC-Enhanced Behavioral Patterns

Source: arXiv:2604.02147 · Published 2026-04-02 · By Zhongbo Wang, Zhiyu Lin, Zhu Wang, Haizhou Wang

TL;DR

The paper addresses the emerging threat of social bots driven by large language models (LLMs), which generate highly human-like content that evades traditional bot detection methods. Prior detectors typically rely on single-modality signals (text or behavior alone), lack sensitivity to AI-generated content (AIGC) production patterns, and model linguistic semantics and behavioral traits independently, limiting detection accuracy.

To overcome these issues, the authors propose TRACE-Bot, a dual-channel framework that jointly models implicit semantic representations extracted from user profile text using GPT-2 and AIGC-enhanced behavioral patterns derived from detailed interaction sequences and signals from state-of-the-art AIGC detectors (Fast DetectGPT and GLTR). This fused approach captures both subtle generative linguistic signatures and behavioral irregularities unique to LLM-driven social bots.

Evaluations on two public LLM-driven social bot datasets—Fox8-23 and BotSim-24—demonstrate state-of-the-art performance with accuracies of 98.46% and 97.50%, respectively, substantially outperforming baselines across precision, recall, and F1 metrics. Ablation studies confirm the complementary importance of both text and behavioral channels, and analyses show robustness against sophisticated bot strategies, supporting the effectiveness of multimodal fusion for detecting next-generation AI-driven social bots.

Key findings

  • TRACE-Bot achieves 98.46% accuracy on Fox8-23 and 97.50% accuracy on BotSim-24, surpassing all 11 evaluated baseline models.
  • On BotSim-24, TRACE-Bot improves accuracy by +0.50, precision by +0.49, and F1-score by +0.24 compared to the second-best method (CACL).
  • Ablation removing the textual channel drops accuracy from 0.9846 to 0.9561, highlighting the importance of semantic representations.
  • Removing the behavioral channel reduces accuracy even further to 0.9189, showing behavioral irregularities are critical for detection.
  • TRACE-Bot maintains a precision of 0.9825 and recall of 0.9868 on Fox8-23, balancing low false positives with high detection rates, unlike other methods with near-perfect recall but low precision.
  • Compression ratio and length of zlib-compressed interaction sequences serve as effective indicators of automated posting behaviors.
  • Integration of two AIGC detection models (Fast DetectGPT and GLTR) provides robust probabilistic features that improve bot discrimination despite individual vulnerabilities.
  • The dual-channel fusion architecture combining GPT-2 embeddings with MLP-transformed behavioral features yields richer joint representations that outperform unimodal approaches by about 3-6% accuracy.

Threat model

The adversary controls automated social media accounts driven by LLMs capable of generating human-like textual content and adapting behavior patterns in real time to mimic authentic users. The adversary knows detection methods may analyze text semantics and behavior but cannot perfectly erase latent AI-generation features or temporal regularities detectable via compression-based and AIGC detectors. They cannot fully modify all multimodal signals simultaneously without sacrificing functionality.

Methodology — deep read

The threat model assumes adversaries operate LLM-driven social bots designed to post human-like content and simulate authentic behavioral dynamics on social media, aiming to evade typical detection algorithms. The attacker can adapt linguistic style, posting times, and interaction patterns, but cannot completely mask the latent statistical and semantic traces left by AI generation.

Data provenance includes two publicly available datasets curated for LLM-driven bot detection: Fox8-23 (approx. 2280 users, balanced bots and genuine accounts, 368k tweets) and BotSim-24 (approx. 2900 users with controlled social interactions and multi-round behaviors). These datasets were randomly split 60/20/20 for training, validation, and testing. For BotSim-24, random undersampling balanced training but retained original test distribution. Personal profile info, tweet posts, and detailed interaction behavioral data were extracted per user.

The architecture consists of four key modules: Data Preparation, Feature Processing, Feature Fusion, and Detection. Personal data features cover six categories (identity info, profile config, engagement metrics, verification, privacy, language/timezone). Interaction behavior processes users' tweets into chronological sequences of interaction types (Original, Retweet, Reply) encoded as symbolic sequences compressed with zlib; compression length and ratio capture behavioral regularity.

Tweets are analyzed by two AIGC detectors—Fast DetectGPT and GLTR—and summary statistics (mean, std, max, min, quantile exceedance proportion) are aggregated to provide probabilistic signals indicative of AI generation.

The fusion module splits inputs into two channels: a textual channel encodes concatenated profile texts with GPT-2, generating implicit semantic embeddings via mean-pooled hidden states, while a behavioral channel feeds standardized behavioral and AIGC-derived numerical features into an MLP with ReLU and dropout to form AIGC-enhanced behavioral embeddings. The final multi-modal user representation concatenates both vectors.

The detection module is a lightweight two-layer MLP classifier using ReLU activations and dropout, trained with weighted cross-entropy loss to handle class imbalance.

Training used up to 10 epochs with batch size 256, early stopping on validation F1-score, and was conducted on an Intel Xeon CPU with NVIDIA T4 GPU. Metrics reported include accuracy, precision, recall, and F1. Baselines span traditional ML, deep learning, graph neural networks, and existing LLM detectors.

A concrete example: a user’s profile text features (username, bio, location) are concatenated and tokenized for GPT-2 encoding generating semantic embeddings. Meanwhile, their tweet timeline is classified by interaction type, compressed, and features extracted (compression ratio, length), combined with AIGC detection scores aggregated from all their tweets. These numeric behavioral features are standardized and fed through the MLP to yield behavioral embeddings. The two embeddings are concatenated and passed through the classifier to output bot vs human probability. Cross-validation on held-out splits measures final accuracy.

The authors released code to enable reproducibility but datasets rely on public Twitter data possibly subject to retrieval constraints. Model weights are not explicitly frozen but described in detail for replication.

Technical innovations

  • A dual-channel architecture jointly modeling implicit semantic user representations from profile text with AIGC-enhanced behavioral patterns improves detection of adaptive LLM-driven bots.
  • Novel behavior sequence encoding using interaction type chronologies compressed with zlib provides quantitative measures of behavioral regularity distinctive to bots.
  • Integration of two state-of-the-art AIGC detection models (Fast DetectGPT and GLTR) generates probabilistic AI-generation features aggregated at user-level for complementary signal fusion.
  • A lightweight MLP-based behavioral embedding complemented by GPT-2 contextual embeddings enables effective multimodal feature fusion avoiding loss of heterogeneous data nuances.

Datasets

  • Fox8-23 — ~2,280 users, ~368,000 tweets — public Twitter-based LLM-driven social bot dataset
  • BotSim-24 — ~2,900 users, ~259,000 tweets — bot simulation dataset with multi-round social interactions

Baselines vs proposed

  • Pasricha et al. (Digital DNA): Accuracy = 0.8202 (Fox8-23) vs TRACE-Bot 0.9846
  • BotRuler: Accuracy = 0.5424 (Fox8-23), 0.5000 (BotSim-24) vs TRACE-Bot 0.9846 and 0.9750 respectively
  • Arin et al.: Accuracy = 0.9627 (Fox8-23) vs TRACE-Bot 0.9846
  • BotRGCN2: Accuracy = 0.9737 (Fox8-23) vs TRACE-Bot 0.9846
  • CACL: Accuracy = 0.9700 (BotSim-24) vs TRACE-Bot 0.9750
  • LMBot (LLM-based baseline): Accuracy = 0.8969 (Fox8-23), 0.4850 (BotSim-24) vs TRACE-Bot 0.9846 and 0.9750
  • Ablation without textual channel: Accuracy drops to 0.9561 from 0.9846
  • Ablation without behavioral channel: Accuracy drops to 0.9189 from 0.9846

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2604.02147.

Fig 5

Fig 5: Results of robustness study of our method on Fox8-23 and BotSim-24.

Fig 6

Fig 6: Results of hyperparameter study of our method on Fox8-23 and BotSim-24.

Fig 7

Fig 7: The case study of our method.

Fig 3

Fig 3: Results of representation learning study of our method on BotSim-24.

Fig 4

Fig 4: Results of label efficiency study of our method.

Fig 6

Fig 6 (page 10).

Fig 7

Fig 7 (page 10).

Fig 8

Fig 8 (page 10).

Limitations

  • Performance evaluated only on two public LLM-driven social bot datasets, which may not generalize across platforms or newer bot generations.
  • AIGC detectors used are known to be vulnerable to adversarial prompting; reliance on their output as probabilistic rather than definitive signals is a mitigation but detection remains imperfect.
  • The model assumes sufficient multimodal data availability per user; accounts with sparse data might lead to degraded performance.
  • No explicit adversarial robustness evaluation under targeted bot evasion strategies or distribution shift scenarios.
  • Traceability and interpretability beyond binary classification decisions are not explored extensively.
  • Training details like random seed control or hyperparameter sensitivity analysis are limited.

Open questions / follow-ons

  • How well does TRACE-Bot generalize to other platforms beyond Twitter, or emerging LLM architectures producing less detectable artifacts?
  • Can incorporating graph/network-level social relationships further improve detection beyond user-centric modalities?
  • How robust is TRACE-Bot under adversarial attacks explicitly designed to fool AIGC detectors or alter behavioral signatures?
  • Can interpretability methods provide insights into which features most strongly drive detection decisions for transparency?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, TRACE-Bot demonstrates the critical need to fuse multimodal signals — integrating both implicit semantic cues from user-generated text and behavioral irregularities enhanced by AI-generation detection — to reliably identify next-generation LLM-driven social bots. Single modality detectors risk high false positives or false negatives due to intense bot sophistication.

Practitioners should consider adopting dual-channel architectures leveraging pretrained language models combined with behavioral sequence modeling, augmented by emerging AIGC detection technologies. This approach enables finer-grained discrimination of AI-driven social automation, informing more adaptive CAPTCHAs and behavioral risk scoring that can dynamically respond to evolving bot strategies.

Cite

bibtex
@article{arxiv2604_02147,
  title={ TRACE-Bot: Detecting Emerging LLM-Driven Social Bots via Implicit Semantic Representations and AIGC-Enhanced Behavioral Patterns },
  author={ Zhongbo Wang and Zhiyu Lin and Zhu Wang and Haizhou Wang },
  journal={arXiv preprint arXiv:2604.02147},
  year={ 2026 },
  url={https://arxiv.org/abs/2604.02147}
}

Read the full paper

Articles are CC BY 4.0 — feel free to quote with attribution