Verifiable and Confidential DNN Inference on Low-End Edge Devices

Source: arXiv:2606.07470 · Published 2026-06-05 · By Mohamed Khalil Kiri, Ivan De Oliveira Nunes, Aurélien Francillon, Norrathep Rattanavipanon

TL;DR

This paper addresses the dual challenge of performing deep neural network (DNN) inference on low-end edge devices while simultaneously protecting model confidentiality and enabling verifiable inference results, all within strict resource constraints. Existing approaches either execute inference fully inside costly Trusted Execution Environments (TEEs), inflating the trusted computing base (TCB) and overhead, or leave the model and execution unprotected, exposing them to extraction and forgery attacks. The authors present VECODI, a novel framework leveraging a new execution abstraction called SHANGRI-LA on ARM TrustZone-M platforms. SHANGRI-LA establishes a third runtime environment with intermediate privileges between the Secure and Normal Worlds, allowing untrusted inference code to run in normal memory but be protected by minimal Secure World support. VECODI uses this to maintain model confidentiality, enforce authorization policies limiting inference queries to mitigate model extraction attacks, and generate cryptographic proofs of correct execution verifiable remotely.

VECODI was implemented on an NUCLEO-L552ZE-Q microcontroller board and open sourced. Evaluations show that VECODI reduces the Secure World TCB by over 95% compared to prior TEE-based solutions that include full inference code within the Secure World. The system achieves very low runtime overhead (~0.83 ms, 0.07%) during inference and small memory footprints. A realistic case study on camera-based image classification demonstrates applicability. Overall, VECODI offers a practical, low-overhead, and secure platform for confidential and verifiable DNN inference on low-cost, constrained edge devices.

Key findings

VECODI reduces the Secure World Trusted Computing Base (TCB) by 95.64% compared to prior TEE-Shielded DNN Partitioning approaches.
Runtime overhead introduced by VECODI is only 0.83 ms, corresponding to 0.07% increase in inference latency on the NUCLEO-L552ZE-Q board.
SHANGRI-LA lifecycle states (Non-Exist, Inactive, Active) are enforced with TrustZone-M SAU and runtime marking, protecting SHANGRI-LA code/data from Normal World access while allowing execution in untrusted memory.
Inference authorization tokens limit the number and frequency of inference queries to mitigate indirect model extraction attacks, enforcing policies configurable by the model provider.
Proof-of-execution tokens generated by SHANGRI-LA are cryptographically bound to: executed code, inputs from peripherals, outputs, and authorization tokens, enabling external verification without revealing raw sensitive input data.
The framework supports runtime dynamic policy updates via Secure World Authorize calls without reloading code or restarting SHANGRI-LA.
Physical peripheral access is secured by SHANGRI-LA's intermediate privilege, preventing Normal World interference to guarantee trustworthy sensor inputs.
Memory usage is optimized by avoiding duplication of model data between Secure and Normal Worlds, unlike prior split-model approaches.

Threat model

The adversaries are (1) AM, who aims to extract the model by full compromise of the device software stack outside the Secure World, with physical access enabling reading of unprotected memory regions, and (2) AI, who aims to forge inference results presented to verifiers by controlling untrusted device software. Both adversaries cannot compromise the Secure World code or the underlying hardware root of trust (secure boot, TrustZone-M isolation, and DMA security). The model provider and Secure World TCB are trusted and assumed free of vulnerabilities. Attacks such as side channels, fault injections, and physical tampering are excluded.

Methodology — deep read

The study designs and implements VECODI, a framework for secure and verifiable deep neural network inference on low-end ARM Cortex-M microcontrollers equipped with TrustZone-M.

Threat Model & Assumptions: The adversaries include AM, aiming to extract the model M by compromising device DEV software or physical memory, and AI, aiming to forge inference results. Model provider PVD is trusted and provisions M and cryptographic materials securely. The trusted computing base includes Secure World code and hardware-enforced isolation features like secure boot and DMA controls. Attacks like fault injection and side-channel are out-of-scope.
Data: The implementation uses publicly available embedded platform NUCLEO-L552ZE-Q. The DNN model and inference workload details are case-study-dependent (e.g., camera-based image classification), but no large dataset training details are specified as focus is on runtime protection.
Architecture & Algorithm: VECODI introduces SHANGRI-LA, a novel third runtime on top of TrustZone-M's Secure and Normal Worlds. SHANGRI-LA resides in the Normal World memory but is granted elevated privileges by minimal Secure World code, strictly between Secure and Normal Worlds. This enables executing untrusted inference code 'F' while protecting its confidentiality and integrity. The Secure World manages the lifecycle via APIs Provision (to store code/data hashes and keys), Authorize (to set invocation limits), Create (to load code/data securely and decrypt model parameters), Execute (to run F with interrupts disabled, generating proof tokens if requested), and Destroy (to erase sensitive state). During Execute, SHANGRI-LA code and data are temporarily marked Non-Secure for execution, then restored to Secure.

To constrain model extraction, an authorization token (Tu) tightly binds usage counters with user identity and input parameters. Proofs of execution (Tproof) are signed by the device and cover the function 'F', usage count, data IDs, input from peripherals, and inferred output. This proof enables external verifiers to confirm correct execution without accessing sensitive input.

Training Regime: Not applicable as focus is on secure inference execution, not model training.
Evaluation Protocol: VECODI is implemented on a physical NUCLEO-L552ZE-Q board. Metrics include increase in Secure World TCB size, runtime overhead (ms and %), memory footprint, security guarantees regarding model confidentiality and verifiable execution. Comparisons to prior TEE-based split inference techniques show dramatic TCB reduction and minimal latency increase. A camera-based image classification case-study evaluates end-to-end applicability.
Reproducibility: The authors open-sourced the VECODI prototype implementation enabling independent verification and extension. Details of cryptographic keys, hash computations, and Secure World API design are specified thoroughly for reproducibility. The underlying dataset and DNN model used in case-study are not explicitly named.

Concrete Example: Upon device provisioning, the Secure World loads SHANGRI-LA code/data hashes and encrypted model parameters via Provision API. The model consumer requests authorization from the model provider, which issues a signed token limiting inference counts. On the device, Create configures SHANGRI-LA with decrypted private data and marks memory regions Secure. On inference request, Execute validates the user token, disables interrupts, restores code/data access to Non-Secure to run the function F, collects sensor inputs securely, performs inference, re-secures data, updates usage counters, optionally generates cryptographic proofs binding inputs, output, and usage, then re-enables interrupts. This process enforces confidentiality, usage restrictions, and verifiable execution with minimal runtime overhead.

Throughout, the Secure World code footprint remains minimal and application agnostic, preserving the overall platform security posture. The paper also discusses memory attributes configuration with the Security Attribution Unit (SAU) and runtime DMA protection to prevent unauthorized access.

Technical innovations

Introduction of SHANGRI-LA as a third runtime environment on TrustZone-M, with privilege strictly between Secure and Normal Worlds, enabling secure execution of untrusted code with minimal Secure World involvement.
A lifecycle management API set (Provision, Authorize, Create, Execute, Destroy) that transitions SHANGRI-LA instances through secure memory states enforcing confidentiality and integrity policies.
Authorization tokens limiting inference queries to mitigate indirect model extraction, where tokens cryptographically bind usage counts to authenticated users.
Construction of proof-of-execution tokens that allow external verifiers to validate inference results' authenticity and input integrity without accessing sensitive raw inputs or requiring full Secure World execution.

Baselines vs proposed

Prior TEE-Shielded DNN Partitioning approaches: Secure World TCB size = 100% baseline vs VECODI: 4.36% (95.64% reduction).
Runtime overhead on NUCLEO-L552ZE-Q: Prior TEE-based solutions incur multiple ms overhead vs VECODI: +0.83 ms (0.07%) during inference.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.07470.

Fig 3

Fig 3: Overview of SHANGRI-LA lifecycle (a) and its corresponding TrustZone-M memory state (b).

Limitations

The framework assumes a bug-free and vulnerability-free Secure World Trusted Computing Base, which may be challenging to guarantee in practice.
Physical attacks such as fault injection, side-channel analysis, or invasive hardware tampering are considered out-of-scope and not addressed by VECODI.
The effectiveness of indirect leakage mitigation relies on appropriate authorization limits set by model providers, which requires domain expertise and might not prevent all advanced model extraction techniques.
The approach depends on TrustZone-M hardware features including runtime SAU reconfiguration and DMA controllers with fine-grained access control, limiting applicability to compatible devices.
The prototype evaluation focuses on a single microcontroller platform and a camera-based image classification case study; generalization across other edge devices and model types remains to be demonstrated.
The trusted peripheral input assumption requires model providers to implement correct input acquisition code, which if flawed could undermine inference correctness despite system protections.

Open questions / follow-ons

How well does SHANGRI-LA generalize to more complex or larger DNN models with higher memory demands on low-end devices?
What concrete methodologies can model providers use to set optimal inference authorization limits that balance utility and model extraction risks?
Can VECODI and SHANGRI-LA concepts be extended or adapted to other TEE architectures beyond TrustZone-M (e.g., RISC-V or Cortex-A platforms)?
How resilient is the system to emerging advanced side-channel or microarchitectural attacks that target TrustZone-M isolation boundaries?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this work provides valuable insights into securing on-device machine learning inference with confidentiality and verifiability guarantees in resource-constrained environments. Many CAPTCHA systems increasingly rely on client-side ML models for challenge generation or user behavior analysis, often executed on edge devices. VECODI's approach to enforce usage policies and produce cryptographic proofs of correct model execution can be adapted to prevent automated bypass attempts that rely on tampering with or extracting ML models locally.

Furthermore, the novel SHANGRI-LA abstraction offers a pathway to reduce TCB size and runtime overhead in embedded trust anchors, enabling practical secure ML deployments without heavyweight TEEs. This can inform designs of captcha verification clients that need to prove genuine challenge solving without leaking model internals or allowing replay attacks. However, the reliance on specific ARM TrustZone-M features and out-of-scope physical attack models means that captcha-sensitive use cases demanding more general threat defense may require additional safeguards or trust assumptions.

Cite

bibtex

@article{arxiv2606_07470,
  title={ Verifiable and Confidential DNN Inference on Low-End Edge Devices },
  author={ Mohamed Khalil Kiri and Ivan De Oliveira Nunes and Aurélien Francillon and Norrathep Rattanavipanon },
  journal={arXiv preprint arXiv:2606.07470},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.07470}
}

Verifiable and Confidential DNN Inference on Low-End Edge Devices ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​