How it works...

In this recipe, we built a classifier to classify the membership of a topic into a particular discussion group. To extract features from the text, a tokenization procedure was needed. In the tokenization phase, within each single sentence, atomic elements called tokens are identified; based on the token identified, it's possible to carry out an analysis and evaluation of the sentence itself. Once the characteristics of the text had been extracted, a classifier based on the multinomial Naive Bayes algorithm was constructed.