Networks used previously were trained to classify images. Semantic segmentation is also an important problem that can be solved by deep learning. Fully Convolutional Networks can be employed to do this.
This notebook tries to attempt at using FCN to solve kaggle Challenge of ultrasound nerve segmentation.
Code in this notebook is inspired from https://github.com/jocicmarko/ultrasound-nerve-segmentation/blob/master/train.py
Some important points to remember
Appropriate name for it is Transposed convolution.
Convolution can be looked at as a matrix operation. A very good visual tutorial is at http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
One important thing to remember in CNN is ' Every filter is small spatially (along width and height), but extends through the full depth of the input volume'
' Example 1. For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 553 = 75 weights (and +1 bias parameter).'
'Example 2. Suppose an input volume had size [16x16x20]. Then using an example receptive field size of 3x3, every neuron in the Conv Layer would now have a total of 3320 = 180 connections to the input volume. Notice that, again, the connectivity is local in space (e.g. 3x3), but full along the input depth (20).'
More details at http://cs231n.github.io/convolutional-networks/
import numpy as np
import os
from skimage.io import imsave,imread
import matplotlib.pyplot as plt
%matplotlib inline
Explore the Dataset Display Imaages and Corresponding Masks
data_path='C:\\NanoDegree\\Kaggle\\NerveSegmentation\\'
def displayImgAndMask(idx):
train_data_path = os.path.join(data_path,'train')
images = os.listdir(train_data_path)
img_name = images[idx]
if 'mask' in img_name:
img_name = img_name.split('_mask')[0]+'.tif'
print(img_name)
img = imread(os.path.join(train_data_path,img_name),as_grey=True)
plt.imshow(img,cmap='gray')
plt.show()
img_mask_name = img_name.split('.')[0]+'_mask.tif'
img_mask = imread(os.path.join(train_data_path,img_mask_name))
plt.imshow(img_mask,cmap='gray')
plt.show()
displayImgAndMask(5900)
Read All the Images and Masks from Training Data set into a numpy Array
image_rows = 420
image_cols = 580
def create_train_data():
train_data_path = os.path.join(data_path,'train')
images = os.listdir(train_data_path)
print (len(images))
# Assuming there is one mask for every image
numImages = int(len(images)/2)
print(numImages)
imageData = np.ndarray((numImages,image_rows,image_cols),dtype = np.uint8)
imageMaskData = np.ndarray((numImages,image_rows,image_cols),dtype=np.uint8)
i = 0 # Index into the Image Data
for image in images:
if 'mask' in image:
continue
imageMask = image.split('.')[0]+'_mask.tif'
img = imread(os.path.join(train_data_path,image),as_grey=True)
imgMask = imread(os.path.join(train_data_path,imageMask),as_grey=True)
imageData[i] = np.array([img])
imageMaskData[i] = np.array([imgMask])
i= i+1
if i % 500 == 0:
print("----------Completed reading next 100------------------------")
np.save('imgs_train.npy',imageData)
np.save('imgs_train_mask.npy',imageMaskData)
create_train_data()
Load Training data from saved npy files
def load_train_data():
imgs_train = np.load('imgs_train.npy')
imgs_train_mask = np.load('imgs_train_mask.npy')
return imgs_train,imgs_train_mask
Pre Processing
from skimage.transform import resize
img_newRows = 96
img_newCols = 96
def preprocess(imgs):
imgs_p = np.ndarray((imgs.shape[0],img_newRows,img_newCols),dtype=np.uint8)
for i in range(imgs.shape[0]):
imgs_p[i] = resize(imgs[i],(img_newRows,img_newCols),preserve_range=True)
print(imgs_p.shape)
imgs_p=imgs_p[...,np.newaxis] # Note this just adds a new dimension at the end. i.e (5635,96,96) becomes (5635,96,96,1)
print(imgs_p.shape)
return imgs_p
Why Additional dimension of '1' at the end ?
A new dimension of 1 is introduced at the end as the images are gray scale images amd tensor flow needs a volume as an input. Ex: In CIFAR10 dataset 32x 32 x 3 becomes input.
Colors become the third dimension in CIFAR10. As there are no colors/channels here a '1' is added at the end.
Also it is important to tell TensorFLow (Backend of Keras) to consider the last dimension as the channel.
Following command is useful to do so
imgs_train, imgs_train_mask = load_train_data()
imgs_train = preprocess(imgs_train)
imgs_train_mask = preprocess(imgs_train_mask)
plt.imshow(imgs_train[100,...,0],cmap='gray')
plt.show()
plt.imshow(imgs_train_mask[100,...,0],cmap='gray')
plt.show()
Define the loss
Loss is being defined as dice_coefficient. @TODO : Add more explanation
def dice_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + 1) / (K.sum(y_true_f) + K.sum(y_pred_f) + 1)
def dice_coef_loss(y_true, y_pred):
return -dice_coef(y_true, y_pred)
Define Network
A smaller version of the Unet is used here.
from keras.models import Model
from keras.layers import Input, concatenate, Conv2D, MaxPooling2D, Conv2DTranspose
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint
from keras import backend as K
def get_unet():
inputs = Input((img_newRows, img_newCols, 1))
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool2)
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv3)
up4 = concatenate([Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(conv3), conv2], axis=3)
conv4 = Conv2D(64, (3, 3), activation='relu', padding='same')(up4)
conv4 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv4)
up5 = concatenate([Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(conv4), conv1], axis=3)
conv5 = Conv2D(32, (3, 3), activation='relu', padding='same')(up5)
conv5 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv5)
conv6 = Conv2D(1, (1, 1), activation='sigmoid')(conv5)
model = Model(inputs=[inputs], outputs=[conv6])
model.compile(optimizer=Adam(lr=1e-5), loss=dice_coef_loss, metrics=[dice_coef])
return model
Normalize the Data
imgs_train = imgs_train.astype('float32')
mean = np.mean(imgs_train) # mean for data centering
std = np.std(imgs_train) # std for data normalization
imgs_train -= mean
imgs_train /= std
imgs_train_mask = imgs_train_mask.astype('float32')
imgs_train_mask /= 255. # scale masks to [0, 1]
K.set_image_data_format('channels_last') # TF dimension ordering in this code
model = get_unet()
model_checkpoint = ModelCheckpoint('weights.h5', monitor='val_loss', save_best_only=True)
model.fit(imgs_train, imgs_train_mask, batch_size=128, epochs=10, verbose=1, shuffle=True,
validation_split=0.2,
callbacks=[model_checkpoint])
Test with the Training Data
As we have not trained the complete deep network, and also number of Epochs used is less we know that the test result will not be best.
Observed that the dice_coef is increasing , i,e we are going closer and closer to our train labels.
Plottinf them and comparing with the labels will give a picture as to how well the model is trained.
imgs_mask_temp = model.predict(imgs_train[0:10,...],verbose=1)
plt.imshow(imgs_mask_temp[1,...,0],cmap='gray')
plt.show()
plt.imshow(imgs_train_mask[1,...,0],cmap='gray')
plt.show()