Principal Component Analysis

Principal Component Analysis (PCA) projects the given dataset onto a lower dimensional linear space so that the variance of the projected data is maximized. PCA requires the eigenvalues and eigenvectors of the covariance matrix, which is the product where X is the data matrix.

SVD on the data matrix X is given as follows:

The following example shows PCA using SVD:

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.figure_factory as FF
import pandas as pd

path = "/neuralnetwork-programming/ch01/plots"
logs = "/neuralnetwork-programming/ch01/logs"

xMatrix = np.array([[0,2,1,0,0,0,0,0],
              [2,0,0,1,0,1,0,0],
              [1,0,0,0,0,0,1,0],
              [0,1,0,0,1,0,0,0],
              [0,0,0,1,0,0,0,1],
              [0,1,0,0,0,0,0,1],
              [0,0,1,0,0,0,0,1],
              [0,0,0,0,1,1,1,0]], dtype=np.float32)

def pca(mat):
    mat = tf.constant(mat, dtype=tf.float32)
    mean = tf.reduce_mean(mat, 0)
    less = mat - mean
    s, u, v = tf.svd(less, full_matrices=True, compute_uv=True)

    s2 = s ** 2
    variance_ratio = s2 / tf.reduce_sum(s2)

    with tf.Session() as session:
        run = session.run([variance_ratio])
    return run


if __name__ == '__main__':
    print(pca(xMatrix))

The output of the listing is shown as follows:

[array([  4.15949494e-01,   2.08390564e-01,   1.90929279e-01,
         8.36438537e-02,   5.55494241e-02,   2.46047471e-02,
         2.09326427e-02,   3.57540098e-16], dtype=float32)]