Cryptography

We talked about cryptography and hash functions in the first chapter where we defined what it was:

It is a mathematical function where, knowing the output, it is almost impossible to find the correct input. But knowing the input of the function, it is very easy to find the correct output. This is made possible because a hash function always returns the same output for the same input.

A hash is a result of a mathematical function that is a transformation applied to an input that generates an output. Y = f(x) is a mathematical function where x is the input and Y is the output.

In mathematical terms, we have the following:

"Knowing Y, it is almost impossible to find x. But knowing x, it is very easy to find Y."

The hash function is essential to make information and transactions secured along the blockchain.

There are basic properties around a hash function that we have seen in Chapter 1, Basics of Blockchains and the Illustration of Village Beta:

  • It is deterministic, meaning that it is always the same output for the same input.
  • It has a defined range, meaning that any size of input can be injected in the function but it will always be the same length output. In other words, no matter the length of the input, the function will always return a fixed number of characters (in Chapter 1Basics of Blockchains and the Illustration of Village Beta, the output length was 64 characters all the time because we were using the hash function, SHA-256).

But these properties are not enough to secure a transaction or a piece of information. There are other cryptographic properties surrounding hash functions used in a blockchain:

  • It is change-sensitive, meaning that if one character of the input is modified, the output will totally be different.
  • It is non-invertible, meaning that it should not be possible to determine efficiently the input of a given output, just like a padlock is not supposed to be deciphered.
  • It is collision-resistant. A collision happens in a hash function when two different inputs generate the same output. Since inputs can have any length but outputs are fixed-length, obviously, there will be collisions. In other words, there is a finite number of possible outputs for an infinite number of inputs. Collision resistance means that finding two different inputs for the same output should not be possible using smart algorithms or strategy but only by trying every possibility. This is what we call brute-force.

What you have to remember for this section is that the hash value of an input can be used as the reference of that input. The hash value is the digital fingerprint of the input data, which can be a document, a transaction, or any kind of information. We can inject any file into the hash function and use the hash value to refer to it, hence uniquely identifying the document, the transaction, or the information.