书名：Python Machine Learning Cookbook（Second Edition）
作者名：Giuseppe Ciaburro Prateek Joshi
本章字数：297字
更新时间：2025-04-04 14:38:11

Introduction

Unsupervised learning is a paradigm in machine learning where we build models without relying on labeled training data. Up to this point, we have dealt with data that was labeled in some way. This means that learning algorithms can look at this data and learn to categorize it them based on labels. In the world of unsupervised learning, we don't have this opportunity! These algorithms are used when we want to find subgroups within datasets using a similarity metric.

In unsupervised learning, information from the database is automatically extracted. All this takes place without prior knowledge of the content to be analyzed. In unsupervised learning, there is no information on the classes that the examples belong to, or on the output corresponding to a given input. We want a model that can discover interesting properties, such as groups with similar characteristics, which happens in clustering. An example of the application of these algorithms is a search engine. These applications are able to create a list of links related to our search, starting from one or more keywords.

These algorithms work by comparing data and looking for similarities or differences. The validity of these algorithms depends on the usefulness of the information they can extract from the database. Available data only concerns the set of features that describe each example.

One of the most common methods is clustering. You will have heard this term being used quite frequently; we mainly use it for data analysis when we want to find clusters in our data. These clusters are usually found by using a certain kind of similarity measure, such as the Euclidean distance. Unsupervised learning is used extensively in many fields, such as data mining, medical imaging, stock market analysis, computer vision, and market segmentation.