Merkle tree
A blockchain filing cabinet, organizing and verifying data efficiently with cryptographic roots and branches.
What is a Merkle tree?
A Merkle tree, also called a hash tree, is a data structure that organizes and verifies large datasets using cryptographic hashes. Named after Ralph Merkle, it works like a tree where each leaf node contains a hash of actual data (such as transactions), and each parent node contains a hash of its children. This hierarchy continues up to a single root hash, called the Merkle root, which serves as a compact fingerprint representing all the data below it.
The key innovation is that you can verify any piece of data belongs to the tree by providing only the Merkle root and a small path of sibling hashes, rather than needing the entire dataset. This makes Merkle trees extremely efficient for verifying data integrity in distributed systems like blockchains.
Why Merkle trees matter
Merkle trees solve a critical problem for blockchains: how to verify specific transactions without downloading and storing every transaction in a block. Without Merkle trees, verifying a single transaction would require processing the entire block. With Merkle trees, verification requires only the Merkle root (stored in the block header) and a logarithmic path of hashes.
Key benefits:
- Efficient verification: Verify any transaction with minimal data, using log₂(n) hashes instead of n transactions
- Reduced bandwidth: Less data transmitted across peer-to-peer networks
- Tamper detection: Any change to data cascades up the tree, immediately revealing modifications
- Lightweight clients: Enables simplified payment verification (SPV) without downloading full blocks
- Scalability: Verification time grows logarithmically, not linearly, with dataset size
For a block with 1,000 transactions, Merkle trees reduce verification requirements by over 99%, needing only about 10 hashes instead of all 1,000 transaction IDs.
Merkle trees in Polkadot
Polkadot uses Merkle trees extensively throughout its architecture, particularly for state storage and rollup verification:
State trie
Polkadot stores all state data in a radix-16 Merkle tree called the state trie. This allows the system to efficiently compute a Merkle root (state root) that cryptographically represents the entire state at any given time. The state root is included in block headers and authenticates the validity of the entire state database.
Proof of validity (PoV) verification
Rollups (i.e., parachains) on Polkadot submit blocks with proof of validity to validators. Because states are stored in Merkle trees, validators can verify state transitions without accessing the entire rollup state. They only need:
- The modified state values
- The hashes of unaffected points in the Merkle tree
- The previous and new state roots
This enables Polkadot to guarantee valid state transitions for rollups without requiring validators to store complete rollup states, providing a major scalability advantage.
Erasure coding
When validators process rollup blocks, they create erasure coding chunks and organize them in a Merkle tree. Each validator receives a chunk along with a Merkle proof, enabling efficient verification that chunks are part of the authentic block data.
Merkleized metadata
Polkadot uses Merkle trees for metadata verification in offline wallets. Metadata is chunked and organized into a Merkle tree, with the root hash included in signed transactions. This allows hardware wallets to verify they're decoding transactions correctly without trusting online parties to provide accurate metadata.
How Merkle trees work
Building a Merkle tree follows a simple process:
- Hash the data: Each transaction or data block is hashed individually, creating leaf nodes
- Pair and hash upward: Adjacent hashes are combined and hashed together to create parent nodes
- Continue to root: The process repeats until reaching a single Merkle root
For verification, you only need the item being verified, the Merkle root from a trusted source, and the sibling hashes along the path from leaf to root. Hashing these together recreates the path up the tree. If the final result matches the known Merkle root, the data is verified as authentic and unmodified.