Pre-processing data using tokenization

The pre-processing of data involves converting the existing text into acceptable information for the learning algorithm.

Tokenization is the process of dividing text into a set of meaningful pieces. These pieces are called tokens.