Hashing and signatures

Let's say you have two text files that are 50 pages long. You want to know whether they are the same or different. One way you could do this would be to hash them. Hashing (or a hashing function) is a mathematical procedure by which any input is turned into a fixed-length output. There are many of these functions, the most common being SHA-1, SHA-2, and MD5. For instance, here is the output of a hashing function called MD5 with an input of two pages of text:

9a137a78cf0c364e4d94078af1e221be

What's powerful about hashing functions is what happens when I add a single character to the end and run the same function:

8469c950d50b3394a30df3e0d2d14d74

As you can see, the output is completely different. If you want to quickly prove that some data has not been changed in any way, a hash function will do it. For our discussion, here are the important parts of hashing functions:

  • They are very fast for computers to run.
  • The function is one way. You can get the hash easily, but you cannot realistically use the hash to restore the original.
  • They can be used recursively. For instance, I can take the hash of the hash; for example, MD5(8469c950d50b3394a30df3e0d2d14d74) becomes 705b003fc9b09ecbeac0b852dfc65377.

This recursive property to hashing is what brings us to the concept of a Merkle tree, named after the man who patented it. A Merkle tree is a data structure that, if your were to draw it on a whiteboard, tends to resemble a tree. At each step of the tree, the root node contains a hash of the data of its children. The following is a diagram of a Merkle tree:

Original illustration by David G?thberg, Sweden, released to the public domain

In a blockchain, this means that there is a recursive hashing process happening. A recursive hash is when we take a hash of hashes. For instance, imagine we have the following words and their hashes. Here, we will use the MD5 algorithm, because it is easy to find MD5 hashing code on the web, so you can try for yourself:

Salad: c2e055acd7ea39b9762acfa672a74136
Fork: b2fcb4ba898f479790076dbd5daa133f
Spoon: 4b8e23084e0f4d55a47102da363ef90c

To take the recursive or the root hash, we would add these hashes together, as follows:

c2e055acd7ea39b9762acfa672a74136b2fcb4ba898f479790076dbd5daa133f4b8e23084e0f4d55a47102da363ef90c

Then we would take the hash of that value, which would result in the following:

189d3992be73a5eceb9c6f7cc1ec66e1

This process can happen over and over again. The final hash can then be used to check whether any of the values in the tree have been changed. This root hash is a data efficient and a powerful way to ensure that data is consistent.

Each block contains the root hash of all the transactions. Because of the one-way nature of hashing, anyone can look at this root hash and compare it to the data in the block and know whether all the data is valid and unchanged. This allows anyone to quickly verify that every transaction is correct. Each blockchain has small variations on this pattern (using different functions or storing the data slightly differently), but the basic concept is the same.