Catching the Fly: Practical Challenges in Making Blockchain FlyClient Real
Source: arXiv:2604.26736 · Published 2026-04-29 · By Pericle Perazzo, Dario Capecchi
TL;DR
This paper is about turning FlyClient from a theoretically appealing lightweight verification protocol into something a real blockchain node can actually serve. The main friction is not the core cryptography — that part already exists — but the engineering and parameterization details needed to deploy it on a live, variable-difficulty chain like Zcash. The authors focus on three practical gaps: how to translate FlyClient security into an economically meaningful attacker model, how to build a prover for a production client implementation, and how to shrink proofs without breaking compatibility.
The result is a fairly pragmatic bridge between theory and deployment. They define a new wa-adversary model tied to an attacker’s work budget, show how to map that to FlyClient parameters, extend Zcash’s Rust full node Zebrad with FlyClient prover RPCs and MMR-node persistence, and evaluate two proof-size optimizations called cumulative proofs and distilled proofs. The headline outcome is that the new adversary parameterization can cut sampled headers substantially in their example, the prover integration adds little overhead to sync, cumulative proofs give a modest proof-size reduction without consensus changes, and distilled proofs give a much larger reduction but require consensus modification.
Key findings
- The wa-adversary parameterization changes the example Zcash verifier from 598 sampled headers to 423 in the interactive case and from 802 to 504 in the non-interactive case, i.e. about -29% and -37% header-download savings, respectively.
- For the same example budget, the inferred optimal parameters shift from (c=0.5, L=100) to (c*=0.25, L*=200) for interactive FlyClient and to (c*=0.2161, L*=231) for non-interactive FlyClient.
- With Zcash headers counted as 1,487 bytes each, the header-only data downloaded drops from 0.848 MiB to 0.600 MiB in the interactive example and from 1.137 MiB to 0.715 MiB in the non-interactive example.
- The prover-side extension to Zebrad required new RPCs for GETHISTORYNODE, GETAUTHDATAROOT, GETTOTALWORK, and GETHEIGHTWITHTOTALWORK, because stock Zebrad only kept MMR peaks in memory.
- The paper reports that adding FlyClient capability does not materially slow a full sync: the time for a FlyClient-capable Zebrad instance is described as comparable to legacy Zebrad, because MMR nodes are already computed during sync and the extra cost is mainly database writes.
- The cumulative-proof optimization reduces non-interactive proof size by 9.16% and requires zero consensus changes.
- The distilled-proof optimization reduces non-interactive proof size by 71.33% but requires a consensus-layer change.
- The authors state that the prover implementation was made available to the Zcash community and that some extensions had already been accepted into the official Zebrad repository at the time of writing.
Threat model
The adversary is a malicious prover or chain forger attempting to make a FlyClient verifier accept an invalid PoW blockchain history. They are assumed unable to generate more than wa cumulative valid work without negligible probability, while still respecting consensus difficulty-transition rules. In the original FlyClient framing, this corresponds to not being able to produce sufficiently long invalid forks with enough valid-work ratio. The verifier is assumed honest, follows the prescribed sampling algorithm, and can fetch headers and MMR proofs from the prover; the attacker cannot break the hash function or consensus rules outright.
Methodology — deep read
The threat model is framed around a malicious FlyClient prover trying to make a verifier accept an invalid PoW chain. The original FlyClient paper uses a (c, L)-adversary, meaning an attacker cannot create an L-long fork with validity ratio c. The authors argue that this is mathematically convenient but economically awkward for deployment, because operators must translate abstract c and L into a real attacker’s budget. Their replacement is the wa-adversary: the attacker cannot generate malicious blocks with more than wa cumulative work, except with negligible probability. That makes the model budget-based rather than ratio-based, which the authors claim is easier to interpret in practice and better suited to choosing parameters for a real verifier.
On the data side, the paper is entirely anchored in Zcash’s production chain and protocol details. They describe Zcash’s FlyClient integration via ZIP 221 and Heartwood, and note that Zcash’s MMR is reset at each network upgrade, creating separate consensus branches. They enumerate the concrete on-chain data structures used by the prover: MMR nodes carry hashSubtreeCommitment, earliest/latest timestamps, earliest/latest target bits, earliest/latest heights, and cumulative work; headers carry hashBlockCommitments, nTime, nBits, nNonce, solutionSize, and solution. The implementation and evaluation appear to be performed on a live or fully synced Zcash chain state, but the excerpt does not give a formal dataset table, train/test split, or labeled corpus because this is not a learning paper. The main “data” is blockchain history plus node database state. In the examples, they use a chain length of 3 million blocks, average block production of 48 blocks/hour, an average block difficulty of 900 GH, and a Zcash 51%-attack cost estimate of about 20,000 USD/hour from crypto51.app.
Algorithmically, the core change is a reparameterization of FlyClient. Under wa, they first show that a wa-adversary automatically implies a family of (c, L)-adversary assumptions satisfying Eq. 5, because any valid fork with cA validity ratio can only be long enough if its cumulative work exceeds wa/c. This lets them invoke the original FlyClient security theorem once the verifier is configured with any (c, L) pair in that family. They then solve for the verifier parameters that minimize the number of sampled headers, rather than directly optimizing byte size. For the interactive protocol they use ndet = L and nprob from the original FlyClient formula; for the non-interactive protocol they use the Fiat-Shamir-adjusted variant. Under the simplifying assumption that recent Zcash difficulty is roughly stable, they define an adversarial block budget na = wa / d̃, so that c*·L* = na. They then derive first-order conditions for the optimal c* numerically (Eq. 6 for interactive, Eq. 7 for non-interactive), because closed forms are not available. The interesting engineering detail here is that they are not trying to optimize an abstract cost function in the vacuum; they are explicitly targeting the practical number of samples, because in Zcash header bytes dominate the proof size enough that fewer samples usually means smaller proofs.
The prover implementation is done by extending Zebrad, the Rust-based official Zcash full node, with FlyClient-serving RPCs. They keep the protocol intelligence on the verifier side: the prover is essentially a dumb data server that answers requests via JSON-RPC. Existing RPCs are reused where possible: GETBLOCKCHAININFO to obtain tip height, total work, and chain length, and GETBLOCKHEADER to fetch a header by height. New RPCs had to be added for GETHISTORYNODE, GETAUTHDATAROOT, GETTOTALWORK, and GETHEIGHTWITHTOTALWORK. The most expensive one is GETHISTORYNODE because it requires maintaining a persistent database of individual MMR nodes rather than only the MMR peaks used internally for maintaining the chain history root. They implement database population both during sync and as a post-sync upgrade path: if a node is installed from scratch, it stores the MMR nodes as it syncs; if a node already exists, it can upgrade the database after the fact. They evaluate the overhead by comparing full sync time for a legacy Zebrad instance, a FlyClient-capable Zebrad instance, and a database-format upgrade path, all on the same virtual machine with high-bandwidth connectivity. In the excerpt, the only explicit conclusion is that the FlyClient-capable sync time is comparable to legacy sync time and that the extra overhead is mainly database writes, not recomputation. The paper also notes a roughly 10-hour database upgrade in the legacy-to-new-format migration path, but the excerpt is truncated before the full quantitative result is shown.
The proof-size optimization section introduces two mechanisms. Cumulative proofs are a zero-consensus-change optimization that reduce non-interactive proof size by 9.16%, apparently by reorganizing the proof material to avoid some redundancy while staying compatible with the current chain rules. Distilled proofs are more aggressive: they shrink non-interactive proof size by 71.33%, but require a consensus change. The excerpt does not include the exact encoding steps for either optimization, so the source text available here is not sufficient to reconstruct their byte-level formats or a full example end-to-end. One concrete end-to-end example that is clear from the text is the parameterization exercise: given a 3-million-block chain, a security target λ = 50, and an adversarial budget equivalent to c·L = 50 blocks, the authors compute the old configuration (c=0.5, L=100) versus the new optimal wa-based configuration, then translate the reduced sample count into a lower header byte count. That example is the main worked instance in the excerpt and shows how the model turns into actual verifier download savings.
Technical innovations
- The wa-adversary replaces FlyClient’s abstract (c, L)-adversary with a single work-budget parameter that maps directly to attacker cost and can be used to choose verifier parameters.
- The paper derives a numerical optimization method for selecting FlyClient verifier settings under wa, targeting the minimum total number of sampled headers while preserving FlyClient security.
- The authors implement the first practical FlyClient prover for Zcash by extending Zebrad with new RPCs and persistent MMR-node storage.
- They propose cumulative proofs, a zero-consensus-change proof-size optimization for FlyClient.
- They propose distilled proofs, a more aggressive proof-size reduction that requires consensus changes but substantially shrinks non-interactive proofs.
Datasets
- Zcash blockchain / consensus branches — full chain history up to the tip (example calculations use a 3-million-block chain) — public blockchain data
- Zebrad node state / MMR-node database — not quantified in the excerpt — derived from Zcash sync and post-sync database upgrade
Baselines vs proposed
- Traditional (c=0.5, L=100) interactive FlyClient: total samplings = 598 vs proposed wa-based optimal (c*=0.25, L*=200): total samplings = 423
- Traditional (c=0.5, L=100) non-interactive FlyClient: total samplings = 802 vs proposed wa-based optimal (c*=0.2161, L*=231): total samplings = 504
- Traditional header download, interactive: 0.848 MiB vs proposed wa-based: 0.600 MiB
- Traditional header download, non-interactive: 1.137 MiB vs proposed wa-based: 0.715 MiB
- Cumulative proofs: non-interactive proof size = baseline vs proposed = -9.16%
- Distilled proofs: non-interactive proof size = baseline vs proposed = -71.33%
Limitations
- The excerpt does not provide full experimental details for the prover evaluation, including exact hardware specs, dataset snapshot, seed strategy, or confidence intervals.
- The optimization for verifier parameters relies on a simplifying assumption of low difficulty variability and roughly constant average recent difficulty, which may not hold on all PoW chains.
- The quantitative proof-size and sync-overhead results are partially described in prose and figures, but the excerpt is truncated before all numeric details are visible.
- The wa-to-(c, L) reduction is only as good as the economic estimate of attacker budget; the paper itself notes the budget estimate can be optimistic because luck and historical difficulty differences matter.
- The cumulative-proof optimization gives only modest savings (-9.16%) relative to its complexity, so its practical benefit may depend on whether zero-consensus-change deployment is the priority.
- The prover implementation is production-oriented but still depends on maintaining extra MMR-node state, which adds operational burden and database growth.
Open questions / follow-ons
- How should wa be estimated for chains with strongly time-varying difficulty or highly volatile hash rate, where a single average work budget may be misleading?
- Can the distilled-proof ideas be adapted to avoid consensus changes, perhaps by using a different commitment or proof aggregation layer?
- What is the true end-to-end verifier cost on constrained clients once MMR-node bytes, network latency, and cryptographic verification are included, not just sampled-header counts?
- How do these prover-side requirements scale under chain reorgs, long-range archival access, or multiple consensus-branch histories beyond Zcash’s current upgrade pattern?
Why it matters for bot defense
For bot-defense practitioners, this paper is relevant less as a CAPTCHA technique and more as a case study in making a theoretically sound verification protocol operational under real deployment constraints. The useful lesson is the parameterization move: instead of choosing security knobs in an abstract adversarial model, the authors reframe them in terms of attacker budget and then optimize the verifier against that budget. That is the same kind of translation you often need in bot mitigation, where an engineer has to convert risk assumptions into measurable operational thresholds.
If you were applying the mindset here to CAPTCHA or bot defense, the main takeaway would be to think carefully about the gap between formal threat models and deployable system parameters. The paper also shows the cost of making a protocol “production ready”: you often need extra state, RPC surface area, storage, and migration paths even when the core algorithm is already known. For CAPTCHA-like systems, that suggests evaluating not only solver resistance but also serving overhead, upgrade friction, and whether a proposed security tweak requires consensus or client changes before it is realistically usable.
Cite
@article{arxiv2604_26736,
title={ Catching the Fly: Practical Challenges in Making Blockchain FlyClient Real },
author={ Pericle Perazzo and Dario Capecchi },
journal={arXiv preprint arXiv:2604.26736},
year={ 2026 },
url={https://arxiv.org/abs/2604.26736}
}