An Evidence-driven Protocol for Trustworthy CI Pipelines

Source: arXiv:2605.21089 · Published 2026-05-20 · By Fernando Castillo, Eduardo Brito, Pille Pullonen-Raudvere, Sebastian Werner, Stefan Tai

TL;DR

This paper addresses the growing challenge of ensuring integrity, authenticity, and trust in software artifacts produced by Continuous Integration (CI) pipelines, especially in complex enterprise supply chains vulnerable to infrastructure attacks. Traditional approaches require every consumer of artifacts to redundantly rebuild and test software to verify provenance, which creates verification bottlenecks and reduces credibility. The paper proposes a novel evidence-driven protocol that integrates Deterministic Build Systems (DBS) with Trusted Execution Environments (TEEs) to provide cryptographically verifiable evidence of integrity and attestation throughout the CI process.

The key innovation is a protocol that formalizes an evidence life cycle—from raw data collection through authentication and attestation to actionable evidence—produced by tamper-resistant TEE-executed deterministic builds. This evidence is cryptographically bound and anchored to a blockchain ledger, enabling scalable, lightweight artifact verification by consumers without requiring costly rebuilds. A practical proof-of-concept implementation using GitLab, NixOS, Argo workflows, Intel TDX TEEs, and Ethereum-based attestation service demonstrates the approach. Experiments show that although overhead for the CI producer roughly doubles runtime and resource use, the verification cost for potentially hundreds of artifact consumers is reduced from redundant execution to fast cryptographic checks. The results highlight a trade-off favoring a single trusted CI build producing a scalable, auditable chain of custody for artifacts.

Key findings

Evidence-driven CI pipeline with TEEs and DBS doubles total workflow time (e.g., increases Oracle Core CI from 3.32 to 6.00 minutes).
CPU consumption approximately doubles with the protocol (e.g., Oracle Core CPU rises from 21.6s/CPU to 39.2s/CPU).
Memory consumption also doubles (e.g., Oracle Core from 4.8s/100MiB to 9.6s/100MiB).
Runtime for individual CI tasks roughly doubles (Oracle Core from 52.64s to 113.2s with evidence protocol).
Every attested evidence signature adds ~294 bytes overhead to artifacts.
Cryptographic operations during attestations impose ~80ms latency each.
Blockchain commit cost is around 393k gas per attestation entry, revocation cost varies from 6.4k to 88k gas.
Simulated scenario shows verification cost grows linearly with number of consumers in untrusted model, but remains nearly constant with evidence-driven approach (Fig 4).

Threat model

Adversaries may be insiders such as malicious CI administrators or developers, external attackers controlling third-party dependencies, or anyone attempting to compromise the build and test environment integrity, inject malicious code, or produce fraudulent artifacts. They cannot break the hardware-enforced isolation of TEEs, forge attestation proofs, or compromise cryptographic keys managed under secure key management systems. The model assumes trust anchors at hardware manufacturing and secure key provisioning.

Methodology — deep read

The authors define a threat model around adversaries capable of compromising CI processes, injecting malicious modifications or backdoors (S1), exploiting third-party dependencies (S2), and exploiting insufficient continuous evidence and audit trails (S3). The trust model includes repository owners, developers, CI admins, version control systems, artifact registries, pipeline infrastructure, and deployment environments.

To mitigate threats, the protocol combines three trust mechanisms: (M1) Secure workflow orchestration that executes CI tasks within hardware-backed Trusted Execution Environments (TEEs) supporting remote attestation and key management; (M2) Deterministic Build Systems (using NixOS/Nix) that produce reproducible builds isolating dependency variations; and (M3) Artifact registries that store digitally signed artifacts along with attested evidence, with provenance hashes anchored to a blockchain ledger (Ethereum Attestation Service).

The evidence life cycle formalizes stages: raw evidence (build logs, commits, test outputs), authenticated evidence (cryptographically signed commits and artifacts), attested evidence (verified by TEE attestations ensuring builds used trusted source and processes), and actioned evidence (influencing CI decisions like deployments).

The protocol executes CI tasks in sequence inside TEEs using deterministic builds. Each step produces evidence authenticated by cryptographic signatures bound to a hash of prior evidence, forming an immutable directed acyclic graph (DAG) of artifact provenance. Attestations and commitments are anchored to blockchain to enable independent auditing. Policy evaluation at each step can reject/abort builds if trust properties fail.

The proof-of-concept implementation uses GitLab for version control, Argo Workflows on Kubernetes orchestrating CI tasks labeled to run inside Intel TDX VMs (TEEs), with Nix/NixOS for reproducible declarative builds and tests executed inside TEEs. Signed build artifacts and audit logs are stored in an artifact registry, with provenance hashes submitted to Ethereum smart contracts.

Evaluation tests three real-world open-source projects selected to represent different CI pipelines. Metrics collected include total workflow time, CPU and memory consumption, runtime per task, evidence signature size, cryptographic operation latency, blockchain commit and revocation costs. Results compare baseline pipelines without trust mechanisms against their solution. Scaling simulations model the cost savings for consumers verifying trusted artifacts versus rebuilding independently.

While the protocol roughly doubles pipeline runtime and resource use at the producer, it enables consumers to verify artifacts via fast constant-time cryptographic checks, avoiding costly redundant rebuilds, thereby addressing credibility deficits in supply chains without sacrificing trust.

Technical innovations

A formal evidence-driven protocol integrating deterministic builds with TEE-based attestations producing chained verifiable evidence of software integrity.
Implementation of the evidence life cycle (raw, authenticated, attested, actioned) mapped to CI pipeline tasks executed within TEEs.
Use of blockchain anchoring (Ethereum Attestation Service) for immutable, publicly auditable proof of CI artifact provenance and policy compliance.
Demonstration that combining DBS with TEE attestation enables scalable, lightweight artifact verification without redundant rebuilds by multiple consumers.

Datasets

Oracle Core (open-source project) — size and source not explicitly specified.
Frontend Interface (https://github.com/aave/interface) — size unspecified.
Backend Service (https://github.com/ChainSafe/ChainBridge/) — size unspecified.

Baselines vs proposed

Oracle Core workflow time without trust mechanisms: 3.32 ± 0.13 min vs with protocol: 6.00 ± 0.24 min
Frontend Interface workflow time without: 4.98 ± 0.20 min vs with: 9.00 ± 0.36 min
Backend Service workflow time without: 9.13 ± 0.37 min vs with: 16.50 ± 0.66 min
CPU consumption (s/CPU) Oracle Core without: 21.60 ± 0.86 vs with: 39.20 ± 1.57
Runtime of tasks Oracle Core without: 52.64 ± 2.11 s vs with: 113.20 ± 4.53 s

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.21089.

Fig 1

Fig 1: Diagram of Trust Mechanisms in a CI Pipeline

Fig 2

Fig 2 (page 6).

Fig 3

Fig 3 (page 6).

Fig 4

Fig 4 (page 6).

Fig 5

Fig 5 (page 6).

Fig 6

Fig 6 (page 6).

Fig 7

Fig 7 (page 6).

Fig 8

Fig 8 (page 6).

Limitations

The doubling of runtime and CPU/memory usage may be prohibitive for some CI environments, especially with large or complex pipelines.
The evaluation datasets are limited in scale and diversity; broader testing on industrial-scale projects is needed to generalize results.
The protocol relies on availability and security of TEE hardware (Intel TDX), which may not be widespread or may introduce its own vulnerabilities.
Blockchain anchoring adds cost and latency; Ethereum gas fees and network congestion could impact practicality.
No explicit adversarial testing or red team evaluation is reported — resilience against sophisticated persistent attackers remains to be demonstrated.
Policy enforcement is assumed sound but details on policy specification languages and handling complex policies are limited.

Open questions / follow-ons

What is the impact of scaling the protocol to extremely large projects with multi-stage pipelines and massive dependency graphs?
How resilient is the protocol against side-channel or physical attacks on TEEs in real deployments?
Can the protocol integrate with emerging functional or policy-rich verification systems beyond deterministic builds and attestations?
What is the overhead and usability impact when integrated with hybrid cloud or multi-tenant heterogeneous CI/CD infrastructures?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this paper's contribution lies in demonstrating how trusted execution environments combined with deterministic build processes can produce end-to-end verifiable evidence of software integrity, reducing the need for repeated computation in verifying artifacts. Analogously, layered cryptographic attestations can enhance the security and trustworthiness in distributed systems that serve bot-challenging functionalities by certifying code provenance and runtime integrity.

The formal evidence-driven protocol and blockchain anchoring techniques highlighted here could inspire more robust approaches to securing CI pipelines that underlie bot-defense systems and CAPTCHAs, where supply chain attacks or unauthorized code modifications may lead to vulnerabilities or circumvention. Reducing the reliance on implicit trust and minimizing redundant verification costs are practical benefits that apply to scalable deployment and rapid verification of bot-defense software updates or challenge-response logic.

Cite

bibtex

@article{arxiv2605_21089,
  title={ An Evidence-driven Protocol for Trustworthy CI Pipelines },
  author={ Fernando Castillo and Eduardo Brito and Pille Pullonen-Raudvere and Sebastian Werner and Stefan Tai },
  journal={arXiv preprint arXiv:2605.21089},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.21089}
}

An Evidence-driven Protocol for Trustworthy CI Pipelines ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​