MIDS: Detecting Stealthy Masquerade and Tampering Attacks on CAN Bus via Bidirectional Mamba

Source: arXiv:2606.18599 · Published 2026-06-17 · By Qiqi Liu, Runhan Song, Lei Cui, Heng Zhang, Yuyan Sun, Limin Sun

TL;DR

This paper addresses a critical gap in Controller Area Network (CAN) intrusion detection: stealthy masquerade and tampering attacks that substitute legitimate frames in situ, preserving traffic periodicity and thus evading traditional frequency-based anomaly detectors. Existing IDS mainly target injection attacks that disrupt traffic timing patterns, but the masquerade threat model is more subtle, requiring novel semantic-level detection. The authors introduce MIDS (Mamba Intrusion Detection System), a dual-stream deep learning framework that independently embeds CAN IDs and payloads, then reconstructs their joint semantic and temporal relationships via a bidirectional selective state-space model called Mamba. This approach captures long-range semantic drifts invisible to earlier methods.

To evaluate MIDS, the authors collected a large-scale real-world Tesla Model 3 CAN dataset—over 100 million frames across diverse driving regimes—and synthesized 54 masquerade attack variants covering ID-only, data-only, and combined modifications at different intensities. MIDS achieved an F1 score of 96.94% on this challenging dataset, surpassing the strongest baseline by more than 8 points while maintaining a 1.147 ms inference latency, suitable for real-time onboard use. Further evaluation across four public CAN benchmark datasets spanning both masquerade and injection attack types demonstrated strong generalization, with F1 scores between 93.7% and 99.61%, outperforming eight baselines by up to 13.94 percentage points under a unified 5-fold split protocol. The study significantly advances CAN IDS research by addressing a realistic, high-privilege internal adversary threat with a computationally efficient and semantically-aware deep architecture.

Key findings

MIDS attains an F1 score of 96.94% on a novel Tesla Model 3 dataset with 54 synthesized masquerade attack variants, exceeding the best reproducible baseline by over 8 percentage points.
The dual-stream embedding of identifiers and data payloads fed into a bidirectional selective state-space model (Mamba) effectively captures subtle semantic and temporal inconsistencies invisible to traditional traffic-statistic defenses.
MIDS achieves a single-window inference latency of 1.147 ms, demonstrating feasibility for real-time deployment in resource-constrained automotive gateways.
On four public CAN benchmark datasets (ROAD, CrySyS, OTIDS, CT&T), covering masquerade and injection attacks, MIDS achieves F1 scores between 93.70% and 99.61%, outperforming eight reproduced baselines by up to 13.94 percentage points under a standard 5-fold cross-validation protocol.
The attack intensity parameter I, controlling the sparsity of tampering injections (e.g., only 1 in 100 frames modified), confirms MIDS’s sensitivity to highly covert masquerade scenarios that preserve per-ID periodicity.
Bidirectional Mamba architecture with asymmetric forward (dstate=16) and backward (dstate=8) selective state-space blocks yields superior temporal context capture compared to unidirectional or uniform parameter setups.
Weighted fusion of forward and backward Mamba hidden states enables adaptive prioritization of causal versus anticipatory anomaly signals.
The comprehensive Tesla Model 3 dataset addresses limitations of prior benchmarks by including realistic masquerade and combined tampering attacks in a modern EV environment.

Threat model

The adversary is a high-privilege internal attacker who has compromised the vehicle’s gateway or ECU software, enabling in-situ substitutions of CAN frames without altering bus timing or increasing traffic volume. The attacker can modify identifier and/or data fields of legitimate frames broadcast at original timeslots, leveraging hardware CRC generation to produce protocol-compliant tampered frames accepted by receivers. The adversary cannot inject new frames or disrupt traffic periodicity without detection and is limited to software-layer manipulation within a compromised ECU or gateway.

Methodology — deep read

The paper is centered on detecting masquerade and tampering attacks on in-vehicle CAN buses—where an adversary with internal software control substitutes legitimate messages in place without altering per-ID timing statistics.

Threat Model & Assumptions: An attacker is assumed to have compromised the central gateway or a high-priority ECU software stack, gaining the ability to intercept and modify CAN frames in situ. Modifications include altering the 11-bit Identifier field (masquerade), the 64-bit data payload field (data tampering), or both simultaneously (combined tampering). The attacker broadcasts tampered frames at the exact original transmission time, preserving periodicity and CRC validity due to hardware-supported checksum generation, effectively bypassing common integrity and timing checks. The adversary cannot increase bus load or inject additional frames without detection.
Data: The primary dataset is over 100 million CAN frames collected from a physical Tesla Model 3 vehicle under three operational regimes: standby, low-speed driving, and high-speed driving. From this benign data, the authors synthesize 54 masquerade attack variants manipulating IDs 0x102 and 0x132 with sparsity intervals I in {2,5,10,25,50,100}, simulating highly stealthy, sparse tampering. The dataset includes ID-only, data-only, and combined tampering. Public datasets ROAD, CrySyS, OTIDS, and CT&T are also used for cross-benchmark evaluation, featuring both masquerade and injection attacks. Data is split with a block-shuffled 5-fold cross-validation protocol to avoid spurious scenario-attack correlations.
Architecture / Algorithm: MIDS uses a dual-stream input pipeline processing sequences of 100 CAN frames. The ID stream maps discrete CAN IDs via a trainable embedding matrix to a continuous latent space capturing functional proximity. The Data stream treats 64-bit payloads as numerical time series, processed by a 1D CNN extracting local semantic features (e.g., signal gradients). These streams are concatenated per frame to form a fused representation.

This fused sequence is fed into a bidirectional Mamba module—a selective state-space model (SSM) with data-dependent transition parameters enabling context-aware information filtering. The forward Mamba block (higher capacity: latent state size 16, convolution depth 4) models causal temporal dependencies, capturing trajectory evolution. The smaller backward block (state size 8, conv depth 2) models anticipatory inconsistencies by processing the reversed sequence, highlighting local semantic contradictions within future context.

The forward and backward hidden states are weighted by learnable scalar parameters and summed to form a global representation passed through a fully connected layer and softmax classifier predicting one of four classes: Normal, Masquerade, Data Tampering, or Combined Attack.

Training Regime: Training uses 5-fold block-shuffled cross-validation with sequences of length 100 frames and large batches. Details on optimizer, epochs, learning rate, or random seed initialization are not explicitly specified in the text provided. The efficient SSM backbone ensures linear computational complexity, enabling training and inference on embedded hardware. Ablation studies validate design choices like asymmetric Mamba branch sizes.
Evaluation Protocol: Metrics include precision, recall, and F1 scores, with emphasis on F1 for balanced evaluation across attack types and Normal traffic. Baselines comprise eight state-of-the-art learned and statistical CAN intrusion detectors reproduced under a unified testing pipeline. The authors demonstrate robustness across multiple datasets including diverse vehicle types and attack modes. They analyze MIDS’s sensitivity to attack sparsity (interval I), ablate model components (unidirectional vs bidirectional, capacity, fusion), and measure inference latency to argue real-time applicability.
Reproducibility: The authors released the full Tesla Model 3 dataset along with the MIDS source code on GitHub and Google Drive, ensuring transparency and enabling future benchmarking and extension. Due to the sensitive nature of vehicle data, some public datasets used remain under controlled distribution. The synthetic masquerade attacks are generated offline by substituting frames within benign traffic captures, ensuring exact ground truth labels.

Technical innovations

Introduction of a dual-stream embedding capturing the coevolution of CAN identifiers and data payloads to model their joint temporal semantics.
Application of the Mamba selective state-space model in a bidirectional architecture enabling linear-time modeling of long-range dependencies with dynamic, input-dependent transition parameters.
Asymmetric forward and backward Mamba block configurations optimized for causal trajectory and anticipatory inconsistency detection.
Weighted fusion of bidirectional hidden states allowing adaptive emphasis on causal versus anticipatory temporal cues during classification.

Datasets

Tesla Model 3 — 108,053,935 frames — collected from a physical vehicle, includes synthesized masquerade and tampering attacks
ROAD CAN Dataset — 1.1 million frames — real and simulated, multiple attack types
CrySyS CAN Dataset — 138,362,148 frames — real and simulated injection and masquerade
OTIDS (HCRL CAN) — 4,613,909 frames — real, injection-only attacks on KIA SOUL
CT&T — 193,241,081 frames — injection attacks on multiple Chevrolet vehicles

Baselines vs proposed

Strongest reproducible baseline on Tesla Model 3 dataset: F1 = 88.5% vs MIDS: F1 = 96.94%
Best baseline on public datasets (ROAD, CrySyS, OTIDS, CT&T): F1 up to 85.76% vs MIDS: F1 up to 99.61%
MIDS outperforms eight reproduced IDS baselines by up to 13.94 percentage points in unified 5-fold evaluation

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.18599.

Fig 1

Fig 1: Overview of tampering attacks threat model. Attackers can exploit vulnerabilities in a weak gateway to initiate the entire tampering attack process.

Fig 2

Fig 2: CAN frame structure

Fig 3

Fig 3: Model architecture of MIDS. The Forward and Backward SSM blocks adopt an asymmetric configuration (dstate = 16, dconv = 4 for the forward

Fig 4

Fig 4: Dataset design

Fig 5

Fig 5: (a) An overview of the test bed, where a monitor positioned in front of the car simulates various driving scenarios. The physical vehicle, a Tesla

Fig 6

Fig 6: Overall model performance and comparisons

Limitations

Attack traces are synthesized offline by frame substitution rather than generated via live bus injection, so real-time hardware effects are not evaluated.
Details on training hyperparameters and hardware used are not fully disclosed, limiting exact reproducibility.
Evaluation primarily focuses on masquerade and injection attacks; other complex threat models like bus-off or multi-vector coordinated attacks are not studied.
The Tesla dataset, while extensive, represents a single vehicle model limiting generalization across EV platforms without further data.
The effectiveness against adaptive adversaries aware of MIDS’s detection features is not analyzed via adversarial attack experiments.

Open questions / follow-ons

How resilient is MIDS to adaptive adversaries who optimize tampering patterns to evade semantic-level detection?
Can the Mamba architecture be efficiently extended to multi-bus or multiplexed automotive networks with interacting protocol layers?
What are the trade-offs when integrating cryptographic message authentication with learned IDSs like MIDS in constrained vehicular environments?
How does MIDS perform under concept drift caused by firmware updates or hardware aging impacting benign CAN traffic distributions?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this work illustrates the value of joint multi-modal feature embeddings combined with bidirectional temporal modeling to detect subtle semantic anomalies that evade simplistic frequency or pattern-based detectors. The masquerade attacks on CAN traffic are analogous to stealthy impersonation or tampering threats in web authentication contexts, where adversaries preserve normal timing and behavioral profiles to avoid detection. The use of selective state-space models (Mamba) with lightweight, adaptive filtering offers a promising paradigm for deploying real-time defenses in constrained edge environments while maintaining high detection fidelity. Bot-defense engineers can draw inspiration from the dual-stream fusion strategy that enforces cross-field consistency checks beyond independent feature distributions, which could be adapted to multi-channel behavioral signals in authentication flows.

Cite

bibtex

@article{arxiv2606_18599,
  title={ MIDS: Detecting Stealthy Masquerade and Tampering Attacks on CAN Bus via Bidirectional Mamba },
  author={ Qiqi Liu and Runhan Song and Lei Cui and Heng Zhang and Yuyan Sun and Limin Sun },
  journal={arXiv preprint arXiv:2606.18599},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.18599}
}

MIDS: Detecting Stealthy Masquerade and Tampering Attacks on CAN Bus via Bidirectional Mamba ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​