Bots Don't Sit Still: A Longitudinal Study of Bot Behaviour Change, Temporal Drift, and Feature-Structure Evolution

Source: arXiv:2512.17067 · Published 2025-12-18 · By Ohoud Alzahrani, Russell Beale, Bob Hendley

TL;DR

This paper investigates whether promotional social bots on Twitter exhibit systematic behavioural changes over time, challenging the common assumption that bot behaviours and associated features remain stationary. Using a large longitudinal dataset of 2,615 verified promotional bot accounts and 2.8 million tweets spanning 2009 to 2020, the authors construct yearly time series of ten widely-used behavioural meta-features such as tweeting volume, URL usage, hashtags, sentiment, emojis, and language diversity. Stationarity tests (Augmented Dickey–Fuller and KPSS) combined with trend estimation reveal that all ten meta-features are non-stationary: nine increase over time, demonstrating growing activity and content richness, while language diversity decreases slightly, indicating a focus on fewer languages.

The study further stratifies bots into three generations based on activation period and three age classes by lifespan, finding distinct behavioural patterns. Second-generation bots (2013–2016) are the most active and heavily use URLs and links, while short-lived bots tend to show intense, repetitive posting with heavy hashtags and URLs. Long-lived bots post less overall but exhibit greater language diversity and more nuanced emoji use. Finally, the authors analyze 153 pairwise relationships between 18 binary behavioural features (spanning actions, content cues, sentiment, and media) and find nearly all pairs are dependent, with correlations evolving in strength and direction across generations. Later-generation bots display more structured, coordinated feature combinations.

Overall, the results provide clear empirical evidence that promotional social bots adapt their behavioural signatures over time at both individual feature and inter-feature dependency levels. This undermines static assumptions in bot detection models trained on historical data and motivates periodic updates and temporal awareness in detection systems.

Key findings

All ten behavioural meta-features related to tweeting activity, content, and expression are non-stationary (Augmented Dickey–Fuller test p-values >0.05 across 2009–2020).
KPSS tests indicate nine meta-features follow deterministic trends with consistent upward slopes (e.g., URLs at +22,413/year, media at +17,889/year, tweeting at +17,084/year), while retweeting exhibits a stochastic trend.
Language diversity declines slightly over time (slope −41/year), suggesting narrowing linguistic focus.
Second-generation bots (2013–2016) show the highest activity levels and percentage of tweets with URLs (56.8% of posts include URLs).
Short-lived bots (1–4 years lifespan) engage in intense repetitive output with heavy hashtag and URL use, whereas long-lived bots post less but use more languages and emojis more flexibly.
Out of 153 pairwise behavioural feature pairs, χ2 tests confirm almost all exhibit dependency, disproving independence assumptions.
Spearman correlation analyses reveal dynamic changes in feature pair relationships across generations, including polarity flips and strengthened associations (e.g., multiple hashtags with media, sentiment with URLs).
Later bot generations demonstrate more structured and coordinated combinations of behavioural cues, indicating evolving strategies.

Threat model

The adversary is a bot operator deploying promotional social bots on Twitter who actively adapts behaviour over time to evade detection and optimize promotion. They can vary posting intensity, use diverse content features (URLs, hashtags, emojis), and alter behavioural co-occurrences. They cannot bypass fundamental platform policies fully or completely prevent data collection retrospectively, but they selectively delete or suspend accounts which biases observed data.

Methodology — deep read

Threat Model & Assumptions: The adversary is a bot operator deploying promotional Twitter bots aiming to maximize content dissemination and evade detection. The study assumes bots adapt behaviourally over time due to platform policies and detection pressures. The adversary’s knowledge includes prior detection techniques but the study does not model active adversarial attacks explicitly.
Data: The dataset aggregates ground-truth promotional bot accounts from several public bot corpora spanning 2006–2021, merged into 2,615 unique accounts with 2.8 million tweets collected using Twint scraper (avoiding Twitter API rate limits). Only full calendar years 2009–2020 are analyzed. Bots are stratified into three generations by first observed appearance (2009–2012, 2013–2016, 2017–2020) and by lifespan groups (short-lived: 1–4 years, mid-lived: 5–8 years, long-lived: 9–12 years).
Architecture / Algorithm: The analysis extracts ten content-focused behavioural meta-features per account per year: counts of tweeting, retweeting, replying, tweets containing URLs, duplicated text (exact cleaned text repetition), hashtags, tweets with positive/negative/neutral sentiment from lexicon analysis, number of distinct languages, emoji-containing tweets, and media (photos/videos). These counts form yearly aggregate time series.
Training Regime: Not applicable - this is empirical observational analysis rather than model training.
Evaluation Protocol: Stationarity tests included Augmented Dickey–Fuller (ADF) to evaluate if the series contains a unit root (non-stationary) and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) to test level and trend stationarity. Appropriate lag selection per series was performed. Linear trends were estimated via ordinary least squares regression over 12 years to measure slope and direction. Behavioural meta-features’ distributions were compared across generations and age classes. Pairwise relationships among 18 binary behavioural features were tested with chi-square for independence and Spearman rank correlations to characterize change in strength and sign across bot generations.
Reproducibility: Code and precise dataset splits are not explicitly released. The constituent bot datasets are public, but data collection relies on Twint scrapes which may be hard to exactly reproduce given Twitter content deletions. Details on preprocessing (e.g., language codes, sentiment thresholds) are provided, enabling approximate replication. The study’s focus on year-level aggregation and well-known statistical tests facilitates verification.

Example end-to-end: For the meta-feature URLs, yearly counts of tweets containing URLs were aggregated over all bots per calendar year, then the ADF test was applied to this 12-point time series (2009–2020). The null hypothesis of a unit root was not rejected (p=0.794), indicating non-stationarity. KPSS tests suggested a deterministic trend, so OLS regression was applied yielding a slope of approximately +22,413 URL-containing tweets/year. This pattern was compared across bot generations and age classes to show that the second generation had highest URL usage intensity, illustrating dynamic behavioural change in link sharing.

Technical innovations

Systematic longitudinal analysis of non-stationarity in core bot behavioural meta-features over a 12-year period using rigorous time series stationarity tests (ADF and KPSS).
Introduction of bot generation and lifespan stratification to reveal systematic behavioural evolution and heterogeneity in promotional bots.
Comprehensive pairwise dependency analysis among 18 binary behavioural features with χ2 tests and dynamic Spearman correlation assessment showing shifting strength and polarity of feature relationships across bot generations.
Empirical demonstration that behavioural feature distributions and co-occurrence structures evolve over time, challenging static assumptions behind conventional bot detection models trained on historical data.

Datasets

Promotional Bots Dataset — 2,615 accounts, 2.8 million tweets — compiled from multiple public bot corpora (MIB project, Botometer, Cresci datasets, etc.)
Traditional Spambots #1 — 565 accounts, 350,512 tweets
Traditional Spambots #3 — 104 accounts, 220,582 tweets
Traditional Spambots #4 — 227 accounts, 96,109 tweets
Social Spambots #2 — 694 accounts, 104,813 tweets
Social Spambots #3 — 151 accounts, 1,084,711 tweets
Caverlee 2011 — 205 accounts, 535,155 tweets
Gilani 2017 — 33 accounts, 90,138 tweets
Varol 2017 — 37 accounts, 87,874 tweets
Cresci Stock 2018 — 39 accounts, 72,804 tweets
Fake Followers — 560 accounts, 155,974 tweets

Baselines vs proposed

Stationarity assumption baseline: null hypothesis of stationary behaviour rejected for all ten meta-features (ADF test p > 0.05); proposed analysis reveals systematic non-stationarity and deterministic trends.
Across three bot generations: first generation tweeting volume < third generation < second generation (most active with highest URL and hashtag use).
Retweeting meta-feature shows stochastic trend (KPSS trend p=0.010) unlike other features showing deterministic trends (p ≈ 0.09–0.10).
Pairwise behavioural feature independence rejected for 152/153 pairs via χ2 tests, demonstrating high inter-feature dependence evolving over time.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2512.17067.

Fig 1

Fig 1: Yearly time series (2009–2020) of the ten behavioural meta-features for promotional bots: tweeting,

Fig 2

Fig 2: Behavioural meta-feature distributions for three generations of promotional bots.

Fig 3

Fig 3: Further behavioural meta-feature distributions for three generations, showing hashtags, text, media

Fig 4

Fig 4: Behavioural meta-feature distributions for three age classes of promotional bots: short-lived (1–4

Fig 5

Fig 5: Analytical framework for characterising behaviour dynamics in promotional Twitter bots. Step 1

Fig 6

Fig 6: Schematic χ2 dependency matrix across all feature pairs. Almost all pairs of behavioural features

Fig 7

Fig 7: Local Transitions T1 and T2 express changes in correlations between bots’ features over three

Fig 8

Fig 8: Schematic illustration of global evolution patterns for pairwise feature relationships. Each cell

Limitations

The dataset, while longitudinal and relatively large, is restricted to promotional bots on Twitter, limiting generalizability to other bot types or platforms.
Data collection via Twint scraping could introduce sampling bias and is vulnerable to tweet deletions or account suspensions, potentially affecting time series completeness and representativeness.
Stationarity tests are applied on yearly aggregated counts, which may obscure finer granular temporal changes or intra-year dynamics.
The study does not incorporate active adversarial modelling or explore causation behind behavioural changes beyond descriptive trends.
Although feature evolution is well studied, the impact on actual bot detection model performance over time is only implied, not directly measured here.
Absence of released code or exact data splits limits full reproducibility.

Open questions / follow-ons

How do these longitudinal behavioural changes quantitatively impact the performance degradation of existing bot detection models trained on historical features?
Can temporal adaptation patterns identified here be incorporated into dynamic or continual learning detection models to improve robustness?
What behavioural dynamics exist in non-promotional bot types (e.g., political bots, fake followers) over similar or longer timeframes?
How does coordination and network structural evolution co-occur with feature-level behavioural changes longitudinally?

Why it matters for bot defense

This paper highlights that bots evolve their usage patterns of behavioural cues such as URLs, hashtags, sentiment, and media over time, challenging the assumption of static feature distributions underlying many detection approaches. For bot-defense practitioners designing or maintaining CAPTCHA or similar bot interaction systems, this underscores the need to continuously update behavioural feature sets and models to keep pace with bot adaptations. Static classifiers that do not account for temporal drift may suffer degradation in effectiveness, particularly as bots learn to mimic human-like behaviours via richer content and more structured feature combinations.

Additionally, the finding that feature interdependencies shift and strengthen with successive bot generations implies that bot detection could benefit from modeling behavioural features jointly rather than independently. Captcha designers might consider incorporating temporal monitoring or dynamic feature analysis as part of broader bot defense layers, improving resilience against adaptive social bots that modify their behavioural signatures to evade static defenses.

Cite

bibtex

@article{arxiv2512_17067,
  title={ Bots Don't Sit Still: A Longitudinal Study of Bot Behaviour Change, Temporal Drift, and Feature-Structure Evolution },
  author={ Ohoud Alzahrani and Russell Beale and Bob Hendley },
  journal={arXiv preprint arXiv:2512.17067},
  year={ 2025 },
  url={https://arxiv.org/abs/2512.17067}
}

Bots Don't Sit Still: A Longitudinal Study of Bot Behaviour Change, Temporal Drift, and Feature-Structure Evolution ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​