Characterizing AI-Assisted Bot Traffic in Darknet Data: Implications for ICS and IIoT Security

Source: arXiv:2605.14209 · Published 2026-05-14 · By Alex Carbajal, Caleb Faultersack, Jonahtan Vasquez, Shereen Ismail, Asma Jodeiri Akbarfam

TL;DR

This paper addresses the evolving threat landscape posed by automated and AI-assisted bot traffic targeting Industrial Control Systems (ICS) and Industrial Internet of Things (IIoT) environments, using a large-scale longitudinal analysis of darknet traffic. The authors analyze 192 million passive darknet packets captured at four time intervals monthly across 2021 and 2025 from the Merit ORION Network Telescope, focusing on shifts in scanning behavior, traffic diversity, and targeted industrial protocol ports. They identify a near doubling in traffic fraction targeting ICS-relevant ports (from 0.82% to 1.51%) and reveal sophisticated botnet evasion tactics such as deliberate micro-pacing (1ms to 100ms inter-packet delays) that flatten volumetric traffic patterns.

Key findings

Traffic targeting ICS ports nearly doubled from 0.8156% (782,953 packets) in 2021 to 1.5064% (1,446,160 packets) in 2025 out of 96 million packets per year.
Global Shannon entropy increased for source IPs (wider attacker distribution) and decreased for destination ports (more focused targeting) from 2021 to 2025.
Inter-arrival time (IAT) burstiness showed a distinct and sharp spike in 1ms to 100ms bins in 2025, indicating micro-pacing behavior to smooth traffic bursts.
Geographic source distribution radically shifted: packet volume from Russia dropped by 88.3%, Iran by 97.4%, while US increased by 77.1%, Bulgaria by 1166.5%, and Seychelles by 2058.1%.
Scanning strategy analysis found a mix of sequential and randomized probing consistent with structured automated reconnaissance.
A simulated anomaly-based IDS using 2021 volumetric baselines detected only 2.53% of 2025 bot traffic (97.47% evasion rate) due to pacing tactics.
Lowering IDS thresholds to detect 90% of bot traffic induced 68.10% false-positive rate on baseline normal traffic.
The observed metrics indicate a paradigm shift to distributed, AI-assisted, and deliberately paced botnet scanning that challenges traditional anomaly detection models.

Threat model

Adversaries consist of distributed, possibly AI-assisted botnets conducting adaptive reconnaissance by scanning internet-exposed ICS and OT services. They can pace their traffic to evade volume-based anomaly detection and leverage geographically distributed scanning infrastructure. They are assumed to not have full control to mask their geographic source or completely mimic benign user traffic patterns.

Methodology — deep read

Threat Model and Assumptions: The adversary consists of automated and AI-assisted scanning botnets conducting reconnaissance against internet-exposed ICS/OT infrastructure. They are capable of distributed scanning with adaptive pacing but cannot fully evade geographic attribution or completely mimic benign traffic.
Data Collection: The dataset consists of 192 million passive packets from the Merit ORION Network Telescope, spanning 2021 (baseline) and 2025. Data was sampled from the 15th day monthly at four UTC intervals (00:00, 06:00, 12:00, 18:00), 2 million packets each, totaling 96 million packets per year. The telescope monitors approx. 500,000 unallocated IPv4 addresses capturing unsolicited background radiation including scanning and backscatter.
Data Preprocessing: Packets were parsed with dpkt, removing malformed and non-IP packets. Temporal consistency was ensured by normalizing timestamps to UTC. GeoLite2 databases aligned temporally with each collection year were used for accurate IP geolocation, mitigating reallocation errors.
Feature Extraction: For each valid IP packet, source IP, destination port, transport protocol, and timestamp were extracted. ICS critical ports (17 total, including Modbus TCP/502, DNP3 TCP/20000, EtherNet/IP TCP/44818/2222, IEC 104 TCP/2404, S7/ISO-TSAP TCP/102, OPC UA TCP/4840, BACnet UDP/47808) were flagged.
Statistical Analysis Pipeline:
- Global Shannon entropy computed separately for source IP distributions and destination port distributions to characterize diversity and targeting specificity.
- Inter-Arrival Time (IAT) distribution aggregated into logarithmic bins spanning 0.001ms to 1000ms to detect burstiness and micro-pacing.
- Scanning strategy gaps computed as numerical differences between sequential targeted destination IPs to distinguish ordered sweeps from random probing.
- Geographic attribution compiled source country distributions with percentage changes between years.
IDS Simulation: A volumetric anomaly-based IDS was simulated using the 2021 baseline packet rate time series aggregated at 1-second intervals. Detection threshold set at mean + 3 standard deviations (99.7% confidence, 57,102 pkts/sec). The 2025 bot traffic was scored against this baseline to measure detection rate and evasion. Subsequently, threshold was lowered to yield 90% detection with resultant false-positive rate computed on baseline.
Reproducibility: All scripts and analysis outputs are publicly available on GitHub, enabling replication. Packet capture data cannot be fully publicly released due to sensitivity but processed metrics and code are accessible.

Example Workflow: A sample 1-second interval capture from 2025 is parsed for IP and port data, IATs calculated and binned. The burstiness histogram reveals a peak in the 1-100ms bin indicative of pacing. Applying the 2021 volumetric threshold to this interval shows the packet rate falls below the alert threshold, demonstrating evasion by paced scanning. This process is repeated over millions of packets to produce aggregated statistics and simulation results.

Technical innovations

Longitudinal characterization of darknet ICS-targeted traffic revealing near doubling of ICS port scans over four years, distinct from prior static snapshots.
Use of global Shannon entropy metrics to quantify divergence of source IP diversity and destination port concentration in evolving AI-assisted bot traffic.
Novel identification of deliberate micro-pacing (1ms–100ms inter-packet delays) burstiness signatures in large-scale darknet traffic as an evasion strategy.
Simulated IDS evaluation demonstrating the ineffectiveness of static volumetric thresholds against dynamically paced modern botnets targeting critical infrastructure.

Datasets

Merit ORION Network Telescope darknet data — 192 million packets — Passive captures of ~500,000 unallocated IPv4 addresses collected 2021 and 2025.

Baselines vs proposed

Volumetric anomaly-based IDS threshold (Mean + 3 SD) on 2021 baseline: Detection rate on 2025 bot traffic = 2.53%, Evasion rate = 97.47%.
High-sensitivity threshold lowered to detect 90% of 2025 bot traffic: False positive rate on baseline normal traffic = 68.10%.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.14209.

Fig 1

Fig 1: System Architecture for Darknet Trafﬁc Analysis Pipeline

Fig 2

Fig 2: ICS Port Targeting Volume and Identiﬁed Scanning Patterns

Fig 3

Fig 3: Cross-year Shannon Entropy Comparison

Fig 4

Fig 4: Inter-Arrival Time (IAT) Distribution and Trafﬁc Burstiness

Fig 5

Fig 5: Cross-Year Packet Volume Shifts Across Top Targeted Industrial

Fig 6

Fig 6: IDS Anomaly Simulation: Volumetric Threshold and False Positives

Fig 7

Fig 7 (page 5).

Fig 8

Fig 8 (page 5).

Limitations

Dataset only includes passive darknet telescope captures, lacking direct insight into post-connection exploitation or payload-level activity.
Analysis is limited to periodic sampling (15th day monthly at 4 times), potentially missing daily or weekly scanning pattern variations.
Volumetric anomaly IDS simulation uses fixed thresholds without testing more sophisticated detection models, limiting generalizability.
Geographic attribution dependent on MaxMind GeoLite2 databases, which may contain inaccuracies or be incomplete for some IPs.
No adversarial evaluation against botnets specifically designed to evade entropy or burstiness-based detection.
Lack of integration of concurrent high-interaction ICS honeypot data to correlate scanning with actual exploitation.

Open questions / follow-ons

How can machine learning models leverage burstiness and inter-arrival time patterns to more reliably detect AI-assisted paced scanning without high false positives?
What are the signatures linking specific geographic scanning shifts and protocol targeting to known botnet families or APT groups?
How does the scanning behavior correlate with subsequent exploitation or attack phases observed in high-interaction ICS honeypots?
Can real-time streaming telemetry and detection systems effectively incorporate dynamically scaling gap and entropy metrics for operational OT environments?

Why it matters for bot defense

For practitioners in bot-defense and CAPTCHA system design, these findings highlight the acute challenges posed by AI-assisted botnets deliberately pacing their traffic to evade volume or rate-based detection mechanisms common in IDS systems. Static thresholding approaches may yield either excessive false positives or high evasion rates, demonstrating the need for more nuanced detection signals such as burstiness and temporal pattern analysis. Captcha and bot-defensive mechanisms integrated into OT networks should consider incorporating timing-based heuristics and entropy-based features to identify distributed automated reconnaissance more accurately. Moreover, the geographic concentration shift suggests defenders might prioritize monitoring and tailored defenses in emerging regional scanning clusters. Finally, this work indicates that anomaly detectors reliant solely on volumetric or signature patterns require significant redesign and validation against adaptive, AI-driven probing attacks to maintain robust situational awareness in critical infrastructure.

Cite

bibtex

@article{arxiv2605_14209,
  title={ Characterizing AI-Assisted Bot Traffic in Darknet Data: Implications for ICS and IIoT Security },
  author={ Alex Carbajal and Caleb Faultersack and Jonahtan Vasquez and Shereen Ismail and Asma Jodeiri Akbarfam },
  journal={arXiv preprint arXiv:2605.14209},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.14209}
}

Characterizing AI-Assisted Bot Traffic in Darknet Data: Implications for ICS and IIoT Security ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​