MHOT: Height-Optimized Authenticated Data Structure for Blockchain State Commitment

Source: arXiv:2606.11736 · Published 2026-06-10 · By Sipeng Xie, Qianhong Wu, Minghang Li, Qiyuan Gao, Bo Qin, Qin Wang

TL;DR

The paper addresses a critical bottleneck in blockchain state commitment efficiency. Ethereum’s canonical Merkle Patricia Trie (MPT) suffers from increased tree height due to fixed-prefix indexing, which leads to long update times and susceptibility to "Nurgle attacks" that deliberately inflate path length to degrade performance. Existing mitigations either increase node fanout, inflating proof sizes exponentially, or rely on vector commitments with trusted setup and costly verification.

The authors propose MHOT, a Height-Optimized Trie data structure that indexes by discriminative bits to achieve minimal tree height with a linear relationship between node fanout and span, breaking the exponential coupling of MPT. To counter the high fanout proof size inflation, they introduce hierarchical proofs—a two-layer Merkle hashing within nodes that reduces proof overhead from O(k) to O(log k). MHOT is content-addressable and persistent, supporting efficient updates, batched commits, and practical disk storage layouts.

On Ethereum mainnet workloads, MHOT achieves up to 9× higher write throughput, 4× lower write amplification, and 2× smaller proofs compared to MPT. Under adversarial Nurgle attacks that exploit prefix collisions, MHOT maintains a 0% attack success rate, versus nearly 100% for MPT. The work establishes height optimality without exotic cryptography as the key abstraction to scalable and attack-resilient blockchain authenticated data structures.

Key findings

MHOT reduces write amplification by a factor of 4× compared to MPT on Ethereum mainnet workloads.
MHOT increases write throughput by up to 9× over MPT under realistic workloads.
Proof sizes for membership queries with MHOT are roughly 1.1–1.4 KB, about half the 2.3–2.9 KB size for MPT.
Under Nurgle attack conditions consuming an entire block's gas budget, MHOT achieves 0% attack success rate, vs 99.97% for MPT.
MHOT's proof overhead per node reduces from O(k) sibling hashes to O(log k) via hierarchical proofs, yielding roughly 6× proof size reduction in compound nodes with typical fanout k=32.
MHOT achieves a theoretical minimal height of ⌈256/(k-1)⌉ levels, e.g., 9 levels for k=32, versus about 52 levels for prefix MPT with equivalent fanout.
MHOT's batch commit pipeline leverages height-stratified parallelism to efficiently finalize nodes, reducing redundant hash recomputations.
Content-addressable node IDs based on node content hashes guarantee structural determinism and support copy-on-write persistence without trusted setup.

Threat model

The adversary is a denial-of-service attacker who generates synthetic keys designed to maximize common prefix lengths in the trie, forcing deeper paths and increased recomputation, thereby degrading blockchain block processing throughput. The attacker cannot break cryptographic hash functions or produce hash collisions beyond standard assumptions. The system assumes honest validators running the committed code and no trusted setup assumptions beyond the underlying hash function properties.

Methodology — deep read

Threat Model and Assumptions: The adversary aims to degrade blockchain state commitment performance by creating pathological key distributions that lengthen authenticated trie paths ("Nurgle attacks"). The attacker can generate keys to maximize common prefixes and induce node splits but cannot break cryptographic hashes or alter data nodes themselves. The system assumes standard collision resistance of hash functions and no trusted setup to minimize trust.
Data: The evaluation uses Ethereum mainnet key spaces and state update workloads. The key size is fixed at 256 bits, matching Keccak-256 output. Though no explicit dataset split is described, experiments simulate real workload and adversarial conditions. Proof size, write amplification, throughput, and attack success rates are measured.
Architecture / Algorithm: MHOT builds on Height-Optimized Tries (HOT) that branch on discriminative bits rather than fixed prefix bits, achieving minimal tree height for a given key set and fanout parameter k (typically k=32). Each compound node stores a mask of discriminative bit positions, sparse partial keys, and child references.

MHOT makes HOT persistent and authenticated by:

Assigning content-addressable node IDs equal to the hash of the node's serialized content, ensuring determinism and allowing copy-on-write updates.
Introducing hierarchical proofs: each compound node maintains an internal binary Merkle tree over its k children, allowing membership proofs requiring O(log k) sibling hashes instead of O(k).
Designing a batch commit pipeline that defers hashing via temporary IDs and processes nodes in height order, enabling parallel hash computation and avoiding redundant recomputation of shared ancestors across multiple insertions.
Applying storage optimizations such as metadata compression, height-based ordering of writes, and asynchronous persistence aligned with LSM-tree designs for efficient disk I/O.

Insertion utilizes four HOT-maintained structural operations that adapt the trie while preventing growth in global height except during a "parent pull-up". The discriminative-bit indexing couples span and fanout linearly, allowing a node with fanout k to consume up to k−1 discriminative bits, absorbing adversarial insertions without significant height increase.

Training / Construction: Node insertions follow the original HOT algorithms extended with deferred hashing and content-addressable persistence. Concurrent batched commit traverses nodes by ascending height to finalize hashes.
Evaluation Protocol: Metrics include write throughput (batch inserts per second), write amplification (extra writes generated), proof sizes, and attack success rate under Nurgle attack strategies. Baselines include MPT and variants such as RainBlock and LVMT under equivalent key workloads. Multiple runs and realistic workload traces from Ethereum mainnet provide data. Proof sizes are measured in kilobytes. Attack success is quantified by the percentage of adversarial keys that increase trie depth.
Reproducibility: The paper does not explicitly mention code release or frozen weights. Some details on algorithms and node layouts are provided, but certain implementation details (e.g., disk layout configurations) are not fully specified. The Ethereum mainnet workloads used for benchmarking are publicly known but no explicit dataset snapshot release is noted.

Example End-to-End: Inserting a new key involves traversing MHOT from root by matching discriminative bits stored in each compound node. If a node has capacity, the key is added (Normal Insert). Otherwise, structural adaptations such as Leaf Pushdown or Intermediate Node Creation occur to maintain height optimality. Modified nodes receive temporary IDs and enqueue in the pending map. After processing all insertions in a block, nodes are finalized in height order, replacing temporary IDs with content hashes. Membership proofs subsequently use the two-layer Merkle proofs reducing sibling hash overhead from O(k) to O(log k) per node.

The resulting authenticated trie structure supports compact proofs, fast updates, and resistance to adversarial depth inflation.

Technical innovations

Indexing by discriminative bits rather than fixed prefixes enables MHOT to achieve provably minimal tree height with linear span-fanout coupling, breaking MPT's exponential coupling.
Hierarchical proofs introduce a two-layer Merkle construction within compound nodes, reducing per-node proof overhead from O(k) to O(log k) without trusted setup.
Content-addressable persistence assigns node IDs as hashes of node content, ensuring deterministic structure and supporting copy-on-write semantics for authenticated updates.
A batched commit pipeline defers hashing and finalizes nodes in ascending height order with parallelism, avoiding redundant recomputations during multiple insertions.

Datasets

Ethereum mainnet state keys and updates — size implicit in billions of accounts and storage keys — public blockchain data

Baselines vs proposed

MPT: write throughput = baseline vs MHOT: up to 9× higher
MPT: write amplification = baseline vs MHOT: 4× lower
MPT: single-point proof size = 2.3–2.9 KB vs MHOT: 1.1–1.4 KB
MPT under Nurgle attack: 99.97% success vs MHOT: 0% success

Limitations

No public code release or detailed instructions for reproducing results reported.
Evaluation focuses on Ethereum mainnet workloads; generalization to other blockchains or key distributions not empirically validated.
No formal security proofs of resistance against adaptive adversaries beyond empirical Nurgle attacks.
MHOT requires larger node sizes due to storing 40-byte content hashes as pointers, increasing node footprint and memory usage.
Hierarchical proofs incur higher commit-time hashing overhead, although mitigated via batching.
Limited discussion on impact of highly skewed or non-uniform real-world update patterns on performance and height optimality.

Open questions / follow-ons

How does MHOT perform under dynamic workloads with skewed key distributions and non-uniform update rates over time?
Can MHOT's hierarchical proofs be extended or combined with vector commitments for further proof size reductions without trusted setup?
What are the trade-offs in memory and network overhead when scaling MHOT to extremely large-scale blockchains with billions of state entries?
How well does MHOT integrate with concurrent execution engines and sharded architectures in terms of parallelism and composability?

Why it matters for bot defense

Bot-defense and CAPTCHA systems that depend on blockchain-based authentication or decentralized state verification could benefit from MHOT's scalable and attack-resilient authenticated data structure. The ability to efficiently commit state while resisting adversarial key distributions translates to more robust, low-latency verification for clients interacting with decentralized networks. Avoiding trusted setup and limiting proof sizes also reduces trust and computational burden on light clients, which is important for user-facing applications such as CAPTCHA validation relying on blockchain state proofs.

Additionally, understanding the structural causes of attack vectors like the Nurgle attack and how MHOT mitigates them through height-optimal, discriminative-bit indexing provides insights into designing robust protocols resistant to adversarial workload manipulations. Lessons from MHOT's hierarchical proofs and batch commit pipeline may inspire analogous optimizations in distributed authentication systems beyond blockchains, including those underlying CAPTCHA challenge generation or bot mitigation data stores.

Cite

bibtex

@article{arxiv2606_11736,
  title={ MHOT: Height-Optimized Authenticated Data Structure for Blockchain State Commitment },
  author={ Sipeng Xie and Qianhong Wu and Minghang Li and Qiyuan Gao and Bo Qin and Qin Wang },
  journal={arXiv preprint arXiv:2606.11736},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.11736}
}

MHOT: Height-Optimized Authenticated Data Structure for Blockchain State Commitment ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​