Multi Layer Perceptron For CIFAR10¶

This an example of Two Layered Perceptron for classification of CIFAR10 Dataset. Network Structure : 2 Hiden Layers Hidden Layer 1 : 256 Nodes Hidden layer 2 : 128 Nodes Total 10 classes

This uses a simple Sigmoid Activation Function and Adam Optimizer for reducing the cost. Cost is computed as Cross Entropy with Logits.

Questions to ask

1) Why do we use Sigmoid Activation function ? What are its advantages ?

2) What is Cross Entropy with Logits ?

Data is assumed to be present in the CIFAR10 folder

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import os
from six.moves import cPickle 
%matplotlib inline

First Step is to do some Data Exploration. Data is present in a pickled format. Format is present at http://www.cs.utoronto.ca/~kriz/cifar.html

Filenames = {'batch1': 'CIFAR10\cifar-10-batches-py\data_batch_1',
             'batch2': 'CIFAR10\cifar-10-batches-py\data_batch_2',
             'batch3': 'CIFAR10\cifar-10-batches-py\data_batch_3',
             'batch4': 'CIFAR10\cifar-10-batches-py\data_batch_4',
             'batch5': 'CIFAR10\cifar-10-batches-py\data_batch_5'
             }

def getImageData(filename):
    f = open(filename,'rb')
    datadict = cPickle.load(f,encoding='latin1') #Why the hell latin1 ???
    f.close()
    X=datadict['data'].reshape((len(datadict['data']), 3, 32, 32)).transpose(0, 2, 3, 1)
    return X
    
#A function to display the 
def display_stats(filename,sample_id):
    f = open(filename,'rb')
    datadict = cPickle.load(f,encoding='latin1') #Why the hell latin1 ???
    f.close()
    X=datadict['data']
    Y=datadict['labels']
    print(len(Y)) # Note: Y is a list
    for _y in set(Y):
        print(_y,Y.count(_y), end = '  ')
    
display_stats(Filenames["batch2"],4)

10000
0 984  1 1007  2 1010  3 995  4 1010  5 988  6 1008  7 1026  8 987  9 985

Display 25 Random Images in a grid

X_image = getImageData(Filenames["batch2"])
fig, axes1 = plt.subplots(5,5,figsize=(3,3))
for j in range(5):
    for k in range(5):
        i = np.random.choice(range(len(X)))
        axes1[j][k].set_axis_off()
        axes1[j][k].imshow(X_image[i,:])

Pre Processing

Before we create a Network and Train the model , we would like to preprocess the data.

Normalize Inputs. (i.e Bring All the inputs in the range of 0-1
One Hot Encoding. (Classification Task)

We will preprocess the Data and save it to disk . Useful for future as well. Also we will write helper functions to provide data in the form of Batches for Optimization Loop used at the time of Training Neural Network

At the time of Pre Processing we would like to create a Validation set as well. So from Training data we will keep aside 10% of Data for Validation.

def normalize(image):
    maximum = np.max(image)
    minimum = np.min(image)
    return (image-minimum)/(maximum-minimum)
# A List of Labels (0-9) integers representing 10 different type of Images needs to encoded.
# Encoding is actually an identity matrix of dimensions 10 X 10
def oneHotEncoding(labels):
    maxval = np.max(labels)
    return np.eye(maxval+1)[labels]
# Uncomment to see how one hot encoding on an example set looks like
#print(oneHotEncoding([1,2,3,4,5,6,7,8,9,1,2,3,4]))

def PreProcessAndSaveCIFAR10():
    validFeatures = []
    validLabels = []
    for (filename,path) in Filenames.items():
        f = open(path,'rb')
        datadict = cPickle.load(f,encoding='latin1') #Why the hell latin1 ???
        f.close()
        features = datadict['data'].reshape((len(datadict['data']), 3, 32, 32)).transpose(0, 2, 3, 1)
        labels =datadict['labels']
        validationCount = int(len(features)*0.1) # Note len(features) gives the value of dim 0 for numpy array
        
        featureSubset = normalize(features[:-validationCount]) # Take only 90% and normalize it
        labelSubset = oneHotEncoding(labels[:-validationCount])
        
        validFeatures.extend(features[-validationCount:]) # Add Remaining 10% to validation Features
        validLabels.extend(labels[-validationCount:])
        
        cPickle.dump((featureSubset,labelSubset),open("preprocess_"+filename,'wb'))
        
    validFeatures = normalize(np.array(validFeatures)) #@TODO :Not sure as to how this works
    validLabels = oneHotEncoding(np.array(validLabels))
    
    cPickle.dump((validFeatures,validLabels),open("preprocess_valid",'wb'))
    
PreProcessAndSaveCIFAR10()

def loadPreProcessingData(filename,batchSize):
    filename="preprocess_"+filename
    f = open(filename,'rb')
    features,labels = cPickle.load(f)
    for start in range(0,len(features),batchSize):
        end = min(start+batchSize,len(features))
        yield features[start:end],labels[start:end]

Create the Two Layer Network

numBatches = 5
numSamplesPerBatch = 10000
numSamples = numBatches*numSamplesPerBatch
numClasses = 10
numHidden1 = 256
numHidden2 = 128
numInput = 32*32*3

X = tf.placeholder("float",[None,numInput])
Y = tf.placeholder("float",[None,numClasses])

stddev = 0.1

weights = {
    'h1': tf.Variable(tf.random_normal([numInput,numHidden1],stddev=0.1)),
    'h2': tf.Variable(tf.random_normal([numHidden1,numHidden2],stddev=0.1)),
    'out':tf.Variable(tf.random_normal([numHidden2,numClasses],stddev=0.1))
}

biases = {
    'b1' : tf.Variable(tf.random_normal([numHidden1],stddev=0.1)),
    'b2' : tf.Variable(tf.random_normal([numHidden2],stddev=0.1)),
    'out': tf.Variable(tf.random_normal([numClasses],stddev=0.1))
}

print ("NETWORK READY")

NETWORK READY

def multiLayerPerceptron(_X,_weights,_biases):
    layer1 = tf.nn.sigmoid(tf.add(tf.matmul(_X, _weights['h1']),_biases['b1']))
    layer2 = tf.nn.sigmoid(tf.add(tf.matmul(layer1,_weights['h2']),_biases['b2']))
    out = tf.add(tf.matmul(layer2,_weights['out']),_biases['out'])
    return out

pred = multiLayerPerceptron(X,weights,biases)

Some Useful functions provided by Tensorflow

tf.argmax - This takes a Tensor and provides the index of the largest element along any axis
tf.equal - Returns if the Tensors are equal element wise.

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.01).minimize(cost)
corr = tf.equal(tf.argmax(pred,1),tf.argmax(Y,1))
accuracy = tf.reduce_mean(tf.cast(corr,'float'))

init = tf.global_variables_initializer()
valid_features, valid_labels = cPickle.load(open('preprocess_valid', mode='rb'))

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(50):
        for (filename,path) in Filenames.items():
            for f,l in loadPreProcessingData(filename,1024):
                sess.run(optimizer,feed_dict={X:f.reshape(len(f),3072),Y:l})
            c = sess.run(cost,feed_dict={X:f.reshape(len(f),3072),Y:l})
            acc = sess.run(accuracy,feed_dict={X:valid_features.reshape(len(valid_features),3072),Y:valid_labels})
            if(epoch%10 == 0):
                print("Epoch =",epoch," Cost = ",c," Accuracy = ", acc)

Epoch = 0  Cost =  2.31427  Accuracy =  0.0998
Epoch = 0  Cost =  2.29476  Accuracy =  0.0946
Epoch = 0  Cost =  2.22907  Accuracy =  0.1582
Epoch = 0  Cost =  2.14231  Accuracy =  0.165
Epoch = 0  Cost =  2.0753  Accuracy =  0.1922
Epoch = 10  Cost =  1.83249  Accuracy =  0.354
Epoch = 10  Cost =  1.7421  Accuracy =  0.3332
Epoch = 10  Cost =  1.79696  Accuracy =  0.3438
Epoch = 10  Cost =  1.70232  Accuracy =  0.361
Epoch = 10  Cost =  1.75288  Accuracy =  0.3516
Epoch = 20  Cost =  1.73321  Accuracy =  0.3622
Epoch = 20  Cost =  1.66606  Accuracy =  0.3542
Epoch = 20  Cost =  1.75357  Accuracy =  0.3558
Epoch = 20  Cost =  1.63176  Accuracy =  0.3588
Epoch = 20  Cost =  1.73412  Accuracy =  0.352
Epoch = 30  Cost =  1.68468  Accuracy =  0.3796
Epoch = 30  Cost =  1.65177  Accuracy =  0.366
Epoch = 30  Cost =  1.72321  Accuracy =  0.3714
Epoch = 30  Cost =  1.62381  Accuracy =  0.3756
Epoch = 30  Cost =  1.69591  Accuracy =  0.3776
Epoch = 40  Cost =  1.63406  Accuracy =  0.3812
Epoch = 40  Cost =  1.60276  Accuracy =  0.3776
Epoch = 40  Cost =  1.65077  Accuracy =  0.3778
Epoch = 40  Cost =  1.61521  Accuracy =  0.377
Epoch = 40  Cost =  1.63342  Accuracy =  0.3714

What Next ? This notebook is aimed only at showing how to write a Multi Layer Perceptron and train it on CIFAR10. Valiation Accuracy is around 37 %. That itself is bad, so not even doing test accuracy. Once we add Convolution Layers to the network we will measure test accuracy as well. Lets hope that the test accuracy is on CIFAR10 with a basic CNN is > 50 %