SciPy

SciPy (pronounced sigh pi) adds a layer to NumPy that wraps common scientific and statistical applications on top of the more purely mathematical constructs of NumPy. SciPy provides higher-level functions for manipulating and visualizing data, and it is especially useful when using Python interactively. SciPy is organized into sub-packages covering different scientific computing applications. A list of the packages most relevant to ML and their functions appear as follows:

Many of the NumPy modules have the same name and similar functionality as those in the SciPy package. For the most part, SciPy imports its NumPy equivalent and extends its functionality. However, be aware that some identically named functions in SciPy modules may have slightly different functionality compared to those in NumPy. It also should be mentioned that many of the SciPy classes have convenience wrappers in the scikit-learn package, and it is sometimes easier to use those instead.

Each of these packages requires an explicit import; here is an example:

import scipy.cluster

You can get documentation from the SciPy website (scipy.org) or from the console, for example, help(sicpy.cluster).

As we have seen, a common task in many different ML settings is that of optimization. We looked at the mathematics of the simplex algorithm in the last chapter. Here is the implementation using SciPy. We remember simplex optimizes a set of linear equations. The problem we looked at was as follows:

Maximize x1 + x2 within the constraints of: 2x1 + x2 ≤ 4 and x1 + 2x2 ≤ 3

The linprog object is probably the simplest object that will solve this problem. It is a minimization algorithm, so we reverse the sign of our objective.

From scipy.optimize, import linprog:

objective=[-1,-1]
con1=[[2,1],[1,2]]
con2=[4,3]
res=linprog(objective,con1,con2)
print(res)

You will observe the following output:

There is also an optimisation.minimize object that is suitable for slightly more complicated problems. This object takes a solver as a parameter. There are currently about a dozen solvers available, and if you need a more specific solver, you can write your own. The most commonly used, and suitable for most problems, is the nelder-mead solver. This particular solver uses a downhill simplex algorithm that is basically a heuristic search that replaces each test point with a high error with a point located in the centroid of the remaining points. It iterates through this process until it converges on a minimum.

In this example, we use the Rosenbrock function as our test problem. This is a non-convex function that is often used to test optimization problems. The global minimum of this function is on a long parabolic valley, and this makes it challenging for an algorithm to find the minimum in a large, relatively flat valley. We will see more of this function:

import numpy as np
from scipy.optimize import minimize
def rosen(x):
 return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

def nMin(funct,x0):

 return(minimize(rosen, x0, method='nelder-mead', options={'xtol':
 1e-8, 'disp': True}))

x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])

nMin(rosen,x0)

The output for the preceding code is as follows:

The minimize function takes two mandatory parameters. These are the objective function and the initial value of x0. The minimize function also takes an optional parameter for the solver method, in this example we use the nelder-mead method. The options are a solver-specific set of key-value pairs, represented as a dictionary. Here, xtol is the relative error acceptable for convergence, and disp is set to print a message. Another package that is extremely useful for machine learning applications is scipy.linalg. This package adds the ability to perform tasks such as inverting matrices, calculating eigenvalues, and matrix decomposition.