书名：Python Machine Learning Cookbook（Second Edition）
作者名：Giuseppe Ciaburro Prateek Joshi
本章字数：266字
更新时间：2021-06-24 15:40:56

Getting ready

Let's visualize our data to understand the problem at hand. We will use the svm.py file for this. Before we build the SVM, let's understand our data. We will use the data_multivar.txt file that's already provided to you. Let's see how to to visualize the data:

Create a new Python file and add the following lines to it (the full code is in the svm.py file which has already been provided to you):

import numpy as np 
import matplotlib.pyplot as plt 
 
import utilities  
 
# Load input data 
input_file = 'data_multivar.txt' 
X, y = utilities.load_data(input_file)

We just imported a couple of packages and named the input file. Let's look at the load_data() method:

# Load multivar data in the input file 
def load_data(input_file): 
    X = [] 
    y = [] 
    with open(input_file, 'r') as f: 
        for line in f.readlines(): 
            data = [float(x) for x in line.split(',')] 
            X.append(data[:-1]) 
            y.append(data[-1])  
 
    X = np.array(X) 
    y = np.array(y) 
 
    return X, y

We need to separate the data into classes, as follows:

class_0 = np.array([X[i] for i in range(len(X)) if y[i]==0]) 
class_1 = np.array([X[i] for i in range(len(X)) if y[i]==1])

Now that we have separated the data, let's plot it:

plt.figure() 
plt.scatter(class_0[:,0], class_0[:,1], facecolors='black', edgecolors='black', marker='s') 
plt.scatter(class_1[:,0], class_1[:,1], facecolors='None', edgecolors='black', marker='s') 
plt.title('Input data') 
plt.show()

If you run this code, you will see the following:

The preceding consists of two types of points—solid squares and empty squares. In machine learning lingo, we say that our data consists of two classes. Our goal is to build a model that can separate the solid squares from the empty squares.