Getting ready

A good way to measure a clustering algorithm is by seeing how well the clusters are separated. Are the clusters well separated? Are the datapoints in a cluster that is tight enough? We need a metric that can quantify this behavior. We will use a metric called the silhouette coefficient score. This score is defined for each datapoint; this coefficient is defined as follows:

Here, x is the average distance between the current datapoint and all the other datapoints in the same cluster, and y is the average distance between the current datapoint and all the datapoints in the next nearest cluster.