Analyzing the Iris dataset

Let's look at a feedforward example using the Iris dataset.

You can download the dataset from https://github.com/ml-resources/neuralnetwork-programming/blob/ed1/ch02/iris/iris.csv and the target labels from https://github.com/ml-resources/neuralnetwork-programming/blob/ed1/ch02/iris/target.csv.

In the Iris dataset, we will use 150 rows of data made up of 50 samples from each of three Iris species: Iris setosa, Iris virginica, and Iris versicolor.

Petal geometry compared from three iris species:
Iris Setosa, Iris Virginica, and Iris Versicolor.

In the dataset, each row contains data for each flower sample: sepal length, sepal width, petal length, petal width, and flower species. Flower species are stored as integers, with 0 denoting Iris setosa, 1 denoting Iris versicolor, and 2 denoting Iris virginica.

First, we will create a run() function that takes three parameters--hidden layer size h_size, standard deviation for weights stddev, and Step size of Stochastic Gradient Descent sgd_step:

def run(h_size, stddev, sgd_step)

Input data loading is done using the genfromtxt function in numpy. The Iris data loaded has a shape of L: 150 and W: 4. Data is loaded in the all_X variable. Target labels are loaded from target.csv in all_Y with the shape of L: 150, W:3:

def load_iris_data():
    from numpy import genfromtxt
    data = genfromtxt('iris.csv', delimiter=',')
    target = genfromtxt('target.csv', delimiter=',').astype(int)
    # Prepend the column of 1s for bias
    L, W  = data.shape
    all_X = np.ones((L, W + 1))
    all_X[:, 1:] = data
    num_labels = len(np.unique(target))
    all_y = np.eye(num_labels)[target]
    return train_test_split(all_X, all_y, test_size=0.33, random_state=RANDOMSEED)

Once data is loaded, we initialize the weights matrix based on x_size, y_size, and h_size with standard deviation passed to the run() method:

x_size= 5
y_size= 3
h_size= 128 (or any other number chosen for neurons in the hidden layer)

# Size of Layers
x_size = train_x.shape[1] # Input nodes: 4 features and 1 bias
y_size = train_y.shape[1] # Outcomes (3 iris flowers)
# variables
X = tf.placeholder("float", shape=[None, x_size])
y = tf.placeholder("float", shape=[None, y_size])
weights_1 = initialize_weights((x_size, h_size), stddev)
weights_2 = initialize_weights((h_size, y_size), stddev)

Next, we make the prediction using sigmoid as the activation function defined in the forward_propagration() function:

def forward_propagation(X, weights_1, weights_2):
    sigmoid = tf.nn.sigmoid(tf.matmul(X, weights_1))
    y = tf.matmul(sigmoid, weights_2)
    return y

First, sigmoid output is calculated from input X and weights_1. This is then used to calculate y as a matrix multiplication of sigmoid and weights_2:

y_pred = forward_propagation(X, weights_1, weights_2)
predict = tf.argmax(y_pred, dimension=1)

Next, we define the cost function and optimization using gradient descent. Let's look at the GradientDescentOptimizer being used. It is defined in the tf.train.GradientDescentOptimizer class and implements the gradient descent algorithm.

To construct an instance, we use the following constructor and pass sgd_step as a parameter:

# constructor for GradientDescentOptimizer
__init__(
  learning_rate,
  use_locking=False,
  name='GradientDescent'
)

Arguments passed are explained here:

learning_rate: A tensor or a floating point value. The learning rate to use.
use_locking: If True, use locks for update operations.
name: Optional name prefix for the operations created when applying gradients. The default name is "GradientDescent".

The following list shows the code to implement the cost function:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_pred))
updates_sgd = tf.train.GradientDescentOptimizer(sgd_step).minimize(cost)

Next, we will implement the following steps:

Initialize the TensorFlow session:

sess = tf.Session()

Initialize all the variables using tf.initialize_all_variables(); the return object is used to instantiate the session.
Iterate over steps (1 to 50).
For each step in train_x and train_y, execute updates_sgd.
Calculate the train_accuracy and test_accuracy.

We stored the accuracy for each step in a list so that we could plot a graph:

    init = tf.initialize_all_variables()
    steps = 50
    sess.run(init)
    x  = np.arange(steps)
    test_acc = []
    train_acc = []
    print("Step, train accuracy, test accuracy")
    for step in range(steps):
        # Train with each example
        for i in range(len(train_x)):
            sess.run(updates_sgd, feed_dict={X: train_x[i: i + 1], y: train_y[i: i + 1]})

        train_accuracy = np.mean(np.argmax(train_y, axis=1) ==
                                 sess.run(predict, feed_dict={X: train_x, y: train_y}))
        test_accuracy = np.mean(np.argmax(test_y, axis=1) ==
                                sess.run(predict, feed_dict={X: test_x, y: test_y}))

        print("%d, %.2f%%, %.2f%%"
              % (step + 1, 100. * train_accuracy, 100. * test_accuracy))
      
        test_acc.append(100. * test_accuracy)
        train_acc.append(100. * train_accuracy)