Introduction

In the field of machine learning, classification refers to the process of using the characteristics of data to separate it into a certain number of classes. This is different than regression, which we discussed in Chapter 1, The Realm of Supervised Learning, where the output is a real number. A supervised learning classifier builds a model using labeled training data and then uses this model to classify unknown data.

A classifier can be any algorithm that implements classification. In simple cases, a classifier can be a straightforward mathematical function. In more real-world cases, a classifier can take very complex forms. In the course of study, we will see that classification can be either binary, where we separate data into two classes, or it can be multi-class, where we separate data into more than two classes. The mathematical techniques that are devised to deal with classification problems tend to deal with two classes, so we extend them in different ways to deal with multi-class problems as well.

Evaluating the accuracy of a classifier is vital for machine learning. What we need to know is, how we can use the available data, and get a glimpse of how the model performs in the real world. In this chapter, we will look at recipes that deal with all these things.