Amplification to Synthesis: A Comparative Analysis of Cognitive Operations Before and After Generative AI

Source: arXiv:2605.13785 · Published 2026-05-13 · By Liz Cho, Dongwook Yoon

TL;DR

This study analyzes how cognitive operations—information campaigns aiming to influence public perception—have structurally evolved between the 2016 and 2024 U.S. presidential elections amid the rise of generative AI. Prior work identified bot-driven retweet amplification as a core tactic in 2016 influence operations. Here, the authors compare publicly available Twitter (X) datasets from these cycles, applying post-type distribution, semantic clustering, temporal synchrony, and lexical similarity analyses on over 133,000 posts collected around key debate events. The 2024 data reveals a pronounced shift away from retweet amplification and near-duplicate messaging toward novel, lexically diverse original posts. Temporal coordination moved from broad, cross-topic synchrony to focused within-narrative bursts. These patterns align with generative AI’s ability to produce varied, narrative-specific content at scale. Although causality with generative AI use is not confirmed, the work establishes empirical operational signatures that distinguish the post-generative AI environment from earlier bot-dominated models. This baseline is crucial for developing detection and mitigation frameworks calibrated to new AI-assisted cognitive threats in geopolitical contexts.

Key findings

Original posts rose from 59.08% in 2016 to 93.19% in 2024, with retweets dropping from 39.98% to 0.01%.
Lexical overlap within semantic clusters collapsed from mean Jaccard similarity 0.99 (2016) to 0.27 (2024), indicating higher linguistic diversity in 2024.
Cross-semantic temporal synchrony at equal sample sizes dropped sharply: 91% of 2016 posts showed synchrony vs 8.3% in 2024.
Within-cluster temporal synchrony reversed: 2016 saw 13.3% posts synchronized within narrative clusters vs 62.5% in 2024, reflecting narrative-focused coordination.
Semantic clusters increased in size thresholds from 5 (2016) to 60 (2024) due to greater data volume and semantic coherence requirements.
Quoting tweets increased from 0.94% (2016) to 6.8% (2024), supporting more original commentary alongside references.
The 2016 coordination pattern aligned with volume amplification via bot retweet networks, while 2024’s pattern suggested AI-driven synthesis and targeted narrative deployment.

Threat model

The adversary is a coordinated, potentially state-linked group conducting influence operations on social media to manipulate public perception during elections or geopolitical events. The adversary can generate and deploy large volumes of text, potentially using bots and generative AI to fabricate original, narrative-specific content. However, they cannot directly subvert platform internal monitoring or access private user data, and rely on public API data to propagate their messaging.

Methodology — deep read

The study begins with the threat model that cognitive operations are coordinated campaigns by (likely state-linked) actors aiming to influence public opinion through social media manipulation. The adversary may leverage automated bots and, post-2022, generative AI tools to produce and deploy content but cannot directly compromise platform internal data or private communications.

Data comprises two Twitter (X) datasets: a 2016 corpus of 1,063 posts extracted around a presidential debate from the FiveThirtyEight Russian Troll Tweets repository, and a 2024 corpus of 132,923 posts collected around an analogous debate event by a USC research group. The time windows chosen represent high-activity events for influence operations.

Preprocessing filtered non-English tweets for lexical analysis, removed URLs and mentions, and excluded very short posts (<20 characters). This resulted in slightly smaller lexical coordination datasets: 980 (2016) and 118,896 (2024).

Text was embedded using Sentence-BERT models: a multilingual model for temporal synchrony analysis and an all-English MiniLM model for lexical similarity. Semantic clustering applied a community detection algorithm on tweet embedding graphs with cosine similarity threshold of 0.82.

Cluster minimum sizes were calibrated manually: 5 for 2016 and 60 for 2024 to balance capturing meaningful coordinated narratives and excluding noise.

Post-type distribution was classified into original posts, retweets, and quote tweets, with percentages compared across datasets.

Temporal synchrony was analyzed at two levels: within semantic clusters and across different clusters (cross-semantic). Volume differences were controlled by random sampling 1,063 posts from the large 2024 corpus to match the 2016 volume for temporal analysis.

Lexical similarity scores were computed as pairwise Jaccard coefficients within clusters to quantify linguistic uniformity.

Finally, detailed statistics and cluster-level synchrony counts were tabulated to contrast coordination styles.

The study does not provide training or model development but focuses on empirical data analytic workflows to detect shifts in coordination and linguistic variation.

An example: For temporal synchrony, the authors identified Cluster #3 in 2024 with 616 posts, detected 62.5% of posts involved in synchronized posting epochs (up to 26 posts co-occurring), contrasting with 2016 Cluster #1 where synchrony was sparse (13.3% posts). This demonstrates targeted narrative amplification rather than broad volume bursts.

No code release or frozen models were mentioned. Datasets are publicly sourced but collected by different teams, which introduces sampling bias risk. The methodology is reproducible in principle provided access to these Twitter datasets and standard embedding/clustering tools.

Technical innovations

Use of combined post-type distribution, temporal synchrony, semantic clustering, and lexical overlap analytics to fingerprint generative AI-influenced cognitive operations.
Differentiation of temporal synchrony at semantic cluster (within-narrative) vs cross-cluster (cross-semantic) scales to distinguish coordination styles.
Calibration of cluster minimum size thresholds across datasets using manual semantic coherence inspection to capture meaningful coordinated activity.
Empirical demonstration that generative AI usage may shift influence operations from retweet amplification toward lexically diverse, narrative-focused content generation and coordination.

Datasets

FiveThirtyEight Russian Troll Tweets (2016) — ~1,063 posts — public repository
USC 2024 U.S. Election-related Tweets — ~132,923 posts — public GitHub repository

Baselines vs proposed

2016 retweets percentage: 39.98% vs 2024 retweets: 0.01%
2016 original posts percentage: 59.08% vs 2024 original posts: 93.19%
2016 lexical Jaccard similarity mean: 0.99 vs 2024 lexical Jaccard similarity mean: 0.27
Cross-semantic temporal synchrony posts involved: 2016 = 91.0% vs 2024 = 8.3%
Within-cluster temporal synchrony posts involved: 2016 = 13.3% vs 2024 = 62.5%

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.13785.

Fig 1

Fig 1: Percentage Distribution of Post Typologies (2016 vs. 2024)

Fig 2

Fig 2: Within-Cluster Temporal Synchrony (2016 vs 2024)

Limitations

Differences in dataset collection protocols and keyword filters between 2016 and 2024 may introduce sampling bias.
Only a single 7-hour window per election cycle was analyzed, limiting generalizability across time or events.
Platform ecology and organic user behavior changes over eight years may confound attribution of observed shifts solely to generative AI influence.
Analysis restricted to English textual content on Twitter/X, excluding multimodal AI-generated media and other social platforms with different affordances.
No direct evidence links the observed coordination shifts causally to generative AI usage; findings are exploratory and indicative rather than confirmatory.
Manual parameter selection for cluster sizes introduces subjectivity impacting replicability.

Open questions / follow-ons

How do coordinated influence operations employing generative AI manifest across multiple platforms with different media modalities (e.g., video, images)?
Can real-time detection models exploit combined temporal synchrony and lexical diversity patterns to distinguish AI-assisted operations from organic viral activity?
What impact does generative AI-driven linguistic diversity have on audience perception and susceptibility to misinformation?
How do generative AI-powered influence operations adapt temporally in response to real-world events with speed and narrative tailoring?

Why it matters for bot defense

This paper highlights the operational shift in cognitive influence campaigns from bot-driven mass amplification via retweets to sophisticated, generative AI-assisted content synthesis producing lexically diverse, narrative-focused messages. For bot-defense and CAPTCHA practitioners, the findings suggest that detection strategies relying on retweet or burst activity patterns are increasingly insufficient. Instead, defenses must incorporate semantic and linguistic analyses capable of identifying coordinated, AI-generated original content exhibiting high semantic coherence but low lexical overlap.

Temporal synchrony detection should differentiate between diffuse cross-topic bursts (common in legacy botnets) and concentrated within-narrative coordination patterns. The study emphasizes the need for detection frameworks that track narrative-specific, synthetic content generation signatures at scale. Ultimately, this encourages CAPTCHAs and bot defense methods to evolve beyond simple behavioral heuristics to include nuanced content and coordination pattern analysis, aligning mitigation strategies with the distinct footprints of generative AI usage in adversarial information operations.

Cite

bibtex

@article{arxiv2605_13785,
  title={ Amplification to Synthesis: A Comparative Analysis of Cognitive Operations Before and After Generative AI },
  author={ Liz Cho and Dongwook Yoon },
  journal={arXiv preprint arXiv:2605.13785},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.13785}
}

Amplification to Synthesis: A Comparative Analysis of Cognitive Operations Before and After Generative AI ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​