Backdoor Threats in Variational Quantum Circuits: Taxonomy, Attacks, and Defenses

Source: arXiv:2605.13796 · Published 2026-05-13 · By Lei Jiang, Fan Chen

TL;DR

This paper surveys the emerging security threat of backdoor attacks in variational quantum circuits (VQCs), which are key components of variational quantum algorithms (VQAs) for near-term quantum computing. The authors formalize terminology and threat models for backdoors that remain dormant under normal conditions but activate malicious behaviors on triggers, such as incorrect outputs or distorted objective values. They classify attacks into three main categories: classical neural network-style data poisoning, compiler-level injection, and quantum-native methods exploiting parameter transfer and noise conditions. Prior defenses targeting classical-style backdoors fail against quantum-specific threats due to unique properties like noise awareness and error mitigation exploitation. The survey highlights how quantum-native backdoors embed stealthy triggers in parameter landscapes or exploit error mitigation (e.g., zero-noise extrapolation) to evade detection. The authors conclude that effective defense requires quantum-aware, multi-layer approaches spanning data, compilation, circuit parameters, and runtime environments in hybrid quantum-classical workflows.

Key findings

Data-poisoning backdoors require 5-10% poisoning rates to achieve >97% attack success rate (ASR) on quantum neural network (QNN) classifiers but are fragile under compilation and noise [8,18,19].
Compiler-level backdoors injected during circuit transpilation or synthesis can achieve 100% ASR without dataset access and remain stealthy pre-compilation, but are limited to QNN classification tasks [1,3,5].
Quantum-native backdoors embed triggers in parameter initialization to manipulate objective values in optimization tasks (VQE, QAOA) and persist under compilation and noise [6].
Noise-triggered backdoors specifically exploit zero-noise extrapolation error mitigation to bias objective estimates, inducing 1.68 to 11.7 times error amplification under targeted noise profiles [6].
Existing detection methods (QSentry output-based, TrojanNet structure-based) detect classical-style backdoors with F1 scores up to 93% but fail against quantum-native attacks that preserve output distributions and circuit topology [7,16].
Backdoor attacks can activate under environment-specific conditions such as hardware noise, qubit connectivity, or compilation configurations, enhancing stealth and robustness.
Backdoor-based watermarking schemes embed ownership proof in VQCs that is robust to recompilation and does not degrade primary task accuracy.
Defense mechanisms must move beyond input/output analysis to parameter-space auditing and runtime behavior monitoring under diverse quantum hardware settings.

Threat model

The adversary aims to embed backdoors in variational quantum circuits that remain dormant under normal inputs and conditions but activate malicious behavior on specific triggers. Depending on the attack class, the adversary may control training data (data poisoning), quantum compilation toolchains (compiler-level), or hardware parameters and noise profiles (quantum-native). The adversary knows the trigger/activation conditions, while defenders do not. Attacker capabilities range from moderate (limited poisoning access) to strong (full control over pretrained model and compilation). The adversary cannot fully control the runtime environment beyond known noise characteristics or hardware configuration but can exploit these factors for stealthy activation.

Methodology — deep read

The authors conduct a comprehensive literature survey and taxonomy synthesis of backdoor threats in variational quantum circuits (VQCs). Their methodology covers the following aspects:

Threat Model and Assumptions: They consider adversaries with varying access—ranging from limited control over training data (data-poisoning) to full control over compilation toolchains (compiler-level attacks) and environment-aware parameter manipulation (quantum-native). The attacker aims to embed stealthy backdoors that activate only on specific triggers, while preserving benign performance otherwise.
Data and Evaluation: The surveyed attacks and defenses are evaluated primarily on benchmark quantum neural network (QNN) classification tasks and optimization problems such as variational quantum eigensolvers (VQE) and quantum approximate optimization algorithm (QAOA). Datasets and sample sizes vary, but typical studies include multiple QNN architectures trained on classical and quantum-encoded data.
Attack Architectures and Algorithms: Backdoor attacks are implemented via multi-task learning formulations where the total loss combines benign task objectives with malicious backdoor objectives weighted by a hyperparameter lambda. Data-poisoning attacks inject trigger-labeled poisoned samples during training. Compiler-level attacks modify quantum circuits during transpilation, inserting malicious gates or leveraging synthesis approximations without altering training data. Quantum-native backdoors manipulate parameter initialization landscapes or exploit hardware noise patterns to activate only under certain NISQ device conditions. Some attacks exploit zero-noise extrapolation (ZNE) mechanisms for biased error mitigation.
Training Regimes: Training protocols generally follow standard VQC parameter optimization using classical optimizers over multiple epochs. Poisoning rates vary (e.g., 5-10%). Parameter-transfer attacks start from pretrained parameters and overlay malicious objectives.
Evaluation Protocols: Attack success rates (ASR) and benign task accuracy are used as primary metrics. Defenses are benchmarked using F1 scores on backdoor input classification or trojaned circuit detection. Robustness is assessed by testing transformation invariance (compilation, transpilation) and noise resilience. Some works isolate environment-triggered activation.
Defenses and Detection: Defenses analyzed include QSentry (output-statistics anomaly detection) and TrojanNet (supervised structural pattern classification). Their scope is investigated with ablations separating classical-style triggers from quantum-native ones.
Reproducibility: The paper is a survey and does not provide new experimental code or datasets; many cited works have released code or results on public datasets but some compilation-level details rely on proprietary hardware configurations or cloud platforms.

Example Analysis: The quantum-native backdoor attack exploiting zero-noise extrapolation (ZNE) is described whereby circuit parameters are perturbed so that noisy expectation measurements seem consistent, but the extrapolation to zero noise is systematically biased, causing distorted optimization results only under noise-scaled conditions. This backdoor is robust to compilation and noise, evading detection because the output distribution appears normal under standard conditions yet malicious when ZNE is applied.

Technical innovations

Formal taxonomy and unified classification of backdoor attacks in VQCs into data-poisoning, compiler-level, and quantum-native categories.
Identification and demonstration of quantum-native backdoors that embed triggers in parameter landscapes and exploit hardware noise or error mitigation techniques (e.g., zero-noise extrapolation) for stealthy activation.
Exposition of compiler-level backdoor insertion exploiting quantum transpilation toolchains to stealthily embed trojans invisible at pre-compilation circuit descriptions.
Critical analysis of existing defense methods revealing their failure to detect quantum-native backdoors due to reliance on classical-style input/output anomalies or structural deviations.
Proposal of the need for quantum-aware, cross-layer defense frameworks combining parameter auditing, noise-aware testing, and runtime monitoring.

Datasets

Multiple quantum neural network classification benchmarks — size and provenance vary by referenced paper — some use public classical datasets adapted for quantum encoding [8,18,19]
Optimization tasks including VQE and QAOA benchmark instances for molecular energies and combinatorial problems as per cited works [6,13]

Baselines vs proposed

Data-poisoning attacks on QNNs: ASR > 98% at 5% poison rate vs clean accuracy preserved [18]
Hybrid data-poisoning and compiler-level attack (QDoor): 100% ASR with unchanged pre-compilation circuit behavior [3]
Quantum-native noise-triggered backdoor attacks induce 1.68 to 11.7× error inflation during ZNE-augmented optimization vs benign baseline error rates [6]
QSentry defense: F1 detection score 75.8% at 1% poison, 93.2% at 10%, limited to classical-style input triggers [16]
TrojanNet defense: 98.8% accuracy detecting compiler-level backdoors on QAOA circuits, ineffective vs parameter-transfer backdoors [7]

Limitations

Most empirical results focus on classification tasks using quantum neural networks; limited evaluation on optimization-centric VQAs where attacks may behave differently.
Data-poisoning methods assume moderate poisoning rates (5-10%) which may be unrealistic in controlled quantum data collection.
Quantum-native backdoors require knowledge or profiling of hardware noise patterns limiting applicability across diverse NISQ devices.
Compiler-level backdoors depend on specific compilation toolchains and optimization passes (e.g., Qiskit), reducing cross-platform generalizability.
Current defenses lack coverage against backdoors embedded in parameter landscapes or environment-aware triggers activated only under specific runtime conditions.
Survey nature: no novel experimental validation or standardized benchmarks integrating multiple attack types and defenses simultaneously.

Open questions / follow-ons

How to design quantum-native defense mechanisms capable of auditing parameter landscapes and detecting environment-dependent backdoors across heterogeneous NISQ devices?
What are the trade-offs and techniques for performing robust cross-device validation and noise-aware runtime monitoring to expose stealthy quantum backdoors?
Can formal verification or statistical certification methods be adapted to quantum circuits to provide provable guarantees against backdoors, especially those exploiting error mitigation techniques?
How to integrate multi-stage, cross-layer attack detection spanning data preprocessing, parameter optimization, compilation, and execution in hybrid quantum-classical workflows?

Why it matters for bot defense

This survey signals a new frontier in bot-defense relevant to CAPTCHA and broader adversarial system security, as it demonstrates that future adversaries might exploit quantum-enhanced computing elements embedded in cryptographic or anti-bot challenges. Bot-defense engineers should anticipate that quantum-classical hybrid algorithms may harbor novel stealth backdoors invisible to classical detection methods. Defensive design must therefore embrace multi-layer, system-level analyses that consider hardware noise profiles, compilation pipelines, and parameter-level triggers. The paper underscores the importance of developing quantum-aware hygiene in model provisioning, parameter sharing, and runtime integrity verification to ensure trustworthiness in quantum-enabled automated challenge-response systems. Although the direct application to current CAPTCHA solutions is nascent, anticipating these threats early is crucial as quantum computing adoption grows in security domains.

Cite

bibtex

@article{arxiv2605_13796,
  title={ Backdoor Threats in Variational Quantum Circuits: Taxonomy, Attacks, and Defenses },
  author={ Lei Jiang and Fan Chen },
  journal={arXiv preprint arXiv:2605.13796},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.13796}
}

Backdoor Threats in Variational Quantum Circuits: Taxonomy, Attacks, and Defenses ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​