Quantum Gatekeeper: Multi-Factor Context-Bound Image Steganography with VQC Based Key Derivation on Quantum Hardware
Source: arXiv:2604.26413 · Published 2026-04-29 · By Sahil Tomar, Sandeep Kumar
TL;DR
Quantum Gatekeeper is a hybrid steganography system that tries to protect not just the payload contents, but the extraction path itself. The core idea is that recovery requires four matched factors at decode time: a password, a shared secret, a user context string, and a signature derived from the original cover image. Those inputs are hashed into a composite seed that drives both keyed pixel traversal and a deterministic variational quantum circuit (VQC) used to generate a gate key. The payload is then encrypted with PBKDF2-stretched AES-GCM and embedded with lossless LSB substitution in a dual-region layout so the header can be recovered before the payload. If any input is wrong, the traversal diverges or AES-GCM rejects, producing silent failure rather than partial disclosure.
What is new here is less about steganographic capacity and more about access control by reconstruction path, plus a quantum-hardware characterization layer. The authors intentionally use exact statevector simulation for encode/decode determinism, while also running the same circuit family on IBM hardware to measure how the output distribution shifts under NISQ noise. In the reported experiments, the method achieved near-perfect cover-image fidelity versus classical and deep-learning baselines, exact recovery for both text and image payloads under correct conditions, and modest but measurable simulator-hardware distribution drift. The paper’s technical novelty is in combining context binding, authenticated encryption, and a quantum-derived traversal control signal in a single all-or-nothing recovery pipeline.
Key findings
- On DIV2K, proposed cover/stego quality reached SSIM = 0.999872 and PSNR = 64.2452 dB, versus HiNet at SSIM = 0.993 and PSNR = 46.57 dB.
- On COCO, proposed cover/stego quality reached SSIM = 0.998708 and PSNR = 58.2635 dB, versus GAN-based baseline at SSIM = 0.968 and PSNR = 37.20 dB.
- On ImageNet, proposed cover/stego quality reached SSIM = 0.996569 and PSNR = 52.4013 dB, versus CAIS at SSIM = 0.943 and PSNR = 33.54 dB.
- Secret-image recovery was exact on DIV2K, COCO, and ImageNet: SSIM = 1.000, PSNR = ∞, RMSE = 0.000, MAE = 0.000 in all three cases.
- Simulator and IBM hardware runs produced the same modal bitstring in the reported examples: DIV2K 1101/1101, COCO 0011/0011, ImageNet 0111/0111.
- Hardware-vs-simulator TVD was low but nonzero: 0.0391 on DIV2K, 0.0371 on COCO, and 0.0825 on ImageNet.
- The paper reports hardware Shannon entropy slightly higher than simulator entropy in all shown cases, consistent with noise-induced spread (e.g., ImageNet 2.7934 simulator vs 2.9734 hardware).
- Reported IBM hardware runtime for 2048 shots was about 7.81 s for one circuit execution in the abstract and roughly 8.97–9.99 s in Table IV, indicating substantial physical-device overhead versus local simulation.
Threat model
The adversary can inspect the stego image, knows the general embedding method, and may try to recover the payload by guessing or partially reconstructing the decode path. They may also know the circuit family and the use of LSB embedding. What they cannot do, by assumption, is simultaneously supply the correct password, shared secret, context string, and cover-image signature. If any one of those factors is wrong, the extraction permutation diverges or AES-GCM rejects the ciphertext, and the design claims no partial plaintext disclosure.
Methodology — deep read
Threat model and assumptions: the adversary is assumed to know the stego image and can try to extract bits, but should not be able to recover the payload unless they reconstruct the exact decode state. The scheme assumes the attacker may know the embedding algorithm, the use of LSB steganography, and even the quantum circuit family, but does not know the correct password, shared secret, context string, or cover-image-derived reference signature. The authors explicitly aim for all-or-nothing recovery: wrong inputs should cause permutation mismatch or AES-GCM authentication failure, with no partial plaintext leakage.
Data and inputs: the paper does not describe a public dataset download or train/test split because this is not a learning-from-data method. Instead, it uses cover images from DIV2K, COCO, and ImageNet as evaluation sources for imperceptibility and recovery. For secret-image payload experiments, the secret image is resized to 512×512, PNG-compressed, and base64-encoded before encryption so that payload size is fixed in a canonical representation. The cover image signature RI is computed from the original unmodified cover buffer, and that signature is one of the four recovery factors. The paper does not specify exact counts of images per dataset, nor any held-out attacker split, because the experiment is essentially per-image functional evaluation rather than statistical learning.
Architecture and algorithm: the pipeline has four interacting parts. First, the password P, shared secret S, context string C, and image signature RI are hashed into a composite seed σ = H(P∥S∥C∥RI). Second, a KDF splits σ into role-specific sub-seeds σh, σp, σq, σe for header traversal, payload traversal, quantum circuit parameters, and encryption keying. Third, PBKDF2 strengthens the password, the derived encryption key KE = H(PPBKDF2(P)∥σe) is used with AES-GCM, and the payload is encrypted to produce ciphertext C, tag T, and nonce N. Fourth, a compact VQC is parameterized deterministically from σq; its exact statevector distribution is evaluated, and the modal output bitstring z⋆ is hashed into a gate key KQ that controls payload traversal order rather than directly encrypting the payload. The paper defines the VQC as n qubits and depth d with parameterized rotations and entangling layers, but it does not fully enumerate gate names or depth values in the excerpt.
A concrete end-to-end example helps: suppose a user embeds a secret image into a PNG cover. The secret image is resized to 512×512, compressed, base64-encoded, then authenticated-encrypted with AES-GCM. The header region stores nonce and length information using an independently keyed traversal πH, so the decoder can recover N and |M| before touching the payload region. The payload region uses πP = Permute(ΩP; σp∥KQ), where KQ comes from the VQC’s deterministic simulation. At decode time, if the user changes the context string or uses a different cover image, σ changes, πH/πP change, and the recovered bitstream becomes scrambled or authentication fails; if all factors match, the embedded bytes are recovered exactly.
Training regime: there is no machine learning training loop in the usual sense for the production path. The VQC is not trained on data; it is a fixed, seed-conditioned circuit whose parameters are generated via hash expansion from σq. The paper distinguishes two execution modes: exact statevector simulation for deterministic encode/decode and shot-based hardware execution on IBM Quantum for characterization. The abstract and tables mention 2048 shots for hardware measurements and a local simulator runtime of 0.0127 s versus 7.81 s on ibm_pittsburgh for one comparison circuit. The source does not state optimizer choice, epochs, batch size, or seed averaging, because there is no supervised model being fit.
Evaluation protocol and reproducibility: the authors evaluate three dimensions. For imperceptibility, they report SSIM, PSNR, RMSE, and MAE comparing original cover images to stego images across DIV2K, COCO, and ImageNet, and compare against 4bit-LSB, CAIS, HiNet, and a GAN-based baseline. For payload fidelity, they compare recovered secret images to the resized 512×512 originals and report exact recovery (SSIM = 1, PSNR = ∞, zero error). For quantum consistency, they compare simulator and IBM hardware output distributions using Shannon entropy, total variation distance, and linear XEB, plus dominant bitstring agreement. The paper says it is implemented in PennyLane, qiskit.remote, AES-GCM, and PBKDF2, but the excerpt does not mention a public code release, frozen weights, or a released dataset. The evaluation appears deterministic per input rather than averaged over random seeds, and no statistical significance testing is reported in the excerpt.
Technical innovations
- Introduces a four-factor recovery condition that binds payload access to password, shared secret, context string, and a cover-image-derived reference signature.
- Uses a deterministic VQC only as a traversal-control primitive, deriving a gate key from exact statevector probabilities instead of using the quantum circuit as a probabilistic decoder.
- Resolves nonce bootstrapping with a dual-region image layout that separates header recovery from payload recovery using independently derived keys.
- Combines lossless LSB embedding with AES-GCM authenticated encryption so wrong reconstruction fails cleanly rather than leaking partial payload bytes.
- Runs the same circuit family on IBM superconducting hardware purely as a statistical validation layer, not as part of the decode-critical path.
Datasets
- DIV2K — size not specified in excerpt — public benchmark
- COCO — size not specified in excerpt — public benchmark
- ImageNet — size not specified in excerpt — public benchmark
Baselines vs proposed
- 4bit-LSB [2]: DIV2K SSIM = 0.895 vs proposed: 0.999872; PSNR = 24.99 dB vs 64.2452 dB
- CAIS [14]: DIV2K SSIM = 0.965 vs proposed: 0.999872; PSNR = 36.10 dB vs 64.2452 dB
- HiNet [15]: DIV2K SSIM = 0.993 vs proposed: 0.999872; PSNR = 46.57 dB vs 64.2452 dB
- GAN-Based [41]: DIV2K SSIM = 0.995 vs proposed: 0.999872; PSNR = 47.12 dB vs 64.2452 dB
- 4bit-LSB [2]: COCO SSIM = 0.894 vs proposed: 0.998708; PSNR = 24.96 dB vs 58.2635 dB
- GAN-Based [41]: COCO SSIM = 0.968 vs proposed: 0.998708; PSNR = 37.20 dB vs 58.2635 dB
- 4bit-LSB [2]: ImageNet SSIM = 0.896 vs proposed: 0.996569; PSNR = 25.00 dB vs 52.4013 dB
- GAN-Based [41]: ImageNet SSIM = 0.965 vs proposed: 0.996569; PSNR = 37.10 dB vs 52.4013 dB
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2604.26413.

Fig 1: Overall architecture of the proposed Quantum Gatekeeper framework. User-defined inputs are transformed into context-bound seed

Fig 2: Embedding and extraction workflow of the proposed Quantum Gatekeeper framework. During embedding, an image-bound signature
Limitations
- The excerpt does not provide exact dataset sample counts, so it is hard to judge the statistical strength of the reported image-quality results.
- The hardware validation is descriptive rather than security-critical; it shows simulator-hardware drift, but not a cryptographic advantage or attack resistance proof.
- No rigorous steganalysis benchmark is reported in the excerpt, so detectability against modern neural steganalyzers remains unclear.
- The system’s security appears to rely on correct handling of the reference image signature; robustness to benign transforms like cropping, recompression, or resize is not established.
- The paper reports deterministic success on selected examples, but not a broad randomized test suite over many seeds, covers, and payload lengths.
- Runtime on actual quantum hardware is much slower than simulation, which limits practicality if hardware-derived validation is required frequently.
Open questions / follow-ons
- How robust is the image-signature binding if the cover image undergoes benign processing such as recompression, color-space conversion, or resizing?
- Does the dual-region layout create detectable structural artifacts that a modern steganalyzer could exploit, especially in the header region?
- What is the entropy and collision resistance of the context-bound seed derivation across many users and many covers, and can an attacker mount offline guessing efficiently?
- Can the hardware-validation idea be turned into a practical device fingerprint or anti-replay signal without sacrificing decode determinism?
Why it matters for bot defense
For bot defense, the most relevant idea is not the quantum circuit itself but the notion of reconstruction-path binding: access depends on multiple contextual factors, not just a single shared secret. That maps loosely to systems where challenge validity should depend on user state, session context, and device or page integrity rather than a reusable token alone. If adapted carefully, the dual-region concept also suggests separating bootstrap metadata from sensitive payload so that a verifier can confirm context before revealing anything useful.
The caution is that this is a steganography paper, not a CAPTCHA evaluation, so its security claims do not directly translate to human/bot discrimination. The quantum component is mainly a deterministic key-derivation and hardware-characterization layer, which is interesting academically but not obviously useful operationally. A bot-defense engineer should take the access-control pattern, not the quantum branding: bind recovery to multiple independently verified inputs, fail closed, and avoid architectures that expose partial state when one factor is wrong.
Cite
@article{arxiv2604_26413,
title={ Quantum Gatekeeper: Multi-Factor Context-Bound Image Steganography with VQC Based Key Derivation on Quantum Hardware },
author={ Sahil Tomar and Sandeep Kumar },
journal={arXiv preprint arXiv:2604.26413},
year={ 2026 },
url={https://arxiv.org/abs/2604.26413}
}