Elasticsearch versus other distributed document store

For simple document storage, you'd typically pick a general-purpose database like MongoDB, and store normalized data.

Normalization is the process of reducing data redundancy and improving data integrity by ensuring components of the data structure are atomic elements.

Denormalization is the process of introducing data redundancy for other benefits, such as performance.

However, searching on normalized data is extremely inefficient. Therefore, to perform a full-text search, you would usually denormalize the data and replicate it onto more specialized database such as Elasticsearch.

Therefore, in most setups, you would have to run two different databases. However, in this book, we will use Elasticsearch for both data storage and search, for the following reasons:

  • The page count of the book is limited
  • Tooling around syncing MongoDB with Elasticsearch is not mature
  • Our data requirements are very basic, so it won't make much difference