keras在优化期间卡住了

时间:2016-12-11 00:55:57

标签: machine-learning deep-learning keras conv-neural-network

在CIFAR10上尝试了Keras示例之后,我决定采用更大的东西:Tiny Imagenet数据集上类似VGG的网络。这是ImageNet数据集的一个子集,包含200个类(而不是1000个),100K图像缩减为64x64。

我从文件vgg_like_convnet.py here获得了类似VGG的模型。不幸的是,事情变得非常像here,除了这次改变学习速度或交换TH的TF没有帮助。既不更改优化器(参见下面的代码)。

准确度基本上保持在0.005,正如所指出的那样,你可以期望完全随机回答200个课程。更糟糕的是,如果通过权重初始化,它从0.007开始,它将迅速收敛到0.005并且坚持在任何后续时期。

Keras代码(TH版)如下:

from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.regularizers import l2, activity_l2, l1, activity_l1
from keras.optimizers import SGD, Adam, Adagrad, Adadelta
from keras.utils import np_utils
import numpy as np
import cPickle as pickle

# seed = 7
# np.random.seed(seed)

batch_size = 64
nb_classes = 200
nb_epoch = 30

# input image dimensions
img_rows, img_cols = 64, 64
# the tiny image net images are RGB
img_channels = 3

# Load the train dataset for TH
print('Load training data')
X_train=pickle.load(open('xtrain_shu_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_shu_th.p','rb')) # np.zeros((100000,1)).astype('uint8')

# Load the test dataset for TH
print('Load validation data')
X_test=pickle.load(open('xtest_th.p','rb')) # np.zeros((10000,3,64,64)).astype('uint8')
y_test=pickle.load(open('ytest_th.p','rb')) # np.zeros((10000,1)).astype('uint8')

# the data, shuffled and split between train and test sets
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(ZeroPadding2D((1,1),input_shape=(3,64,64)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu',))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_6'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_8'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_11'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_13'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_15'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_18'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_20'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_22'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(200, activation='softmax'))

# let's train the model using SGD + momentum (how original).

opt = SGD(lr=0.0001, decay=1e-6, momentum=0.7, nesterov=True)
# opt= Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
# opt = Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)
# opt = Adagrad(lr=0.01, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

print('Optimization....')
model.fit(X_train, Y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          validation_data=(X_test, Y_test),
          shuffle=True)

# Save the resulting model
model.save('model.h5')

Tiny Imagenet数据集由我用djpeg转换为PPM的JPEG图像组成。然后我创建了一个大型二进制文件,其中包含每个图像的类标签(1个字节),后跟(64x64x3个字节)。

从Keras读取这个文件非常缓慢。所以(我对Python很新,听起来可能听起来很愚蠢),我决定初始化一个4D Numpy阵列(100000,3,64,64)(对于TH,(100000,64,64,3))对于TF)与数据集和pickle它。当我运行上面的代码时,现在需要大约40秒才能在数组中加载数据集。

我甚至检查过pickle数组包含正确顺序的数据,代码如下:

import numpy as np
import cPickle as pickle

print("Reading data")
pix=pickle.load(open('xtrain_th.p','rb'))
print("Done")

img=67857

f=open('img'+str(img)+'.ppm','wb')
f.write('P6\n64 64\n255\n')

for y in range(0,64):
    for x in range(0,64):
        f.write(chr(pix[img][0][y][x]))
        f.write(chr(pix[img][1][y][x]))
        f.write(chr(pix[img][2][y][x]))
f.close()

从数据集中提取PPM图像。

最后,我注意到训练数据集太有序了(即前500个图像都属于0级,第二个500属于1级等等)

所以我用下面的代码对它们进行了洗牌:

# Dataset preparation for Theano backend
import cPickle as pickle
import numpy as np
import random as rnd

n=100000

print('Load training data')
X_train=pickle.load(open('xtrain_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_th.p','rb')) # np.zeros((100000,1)).astype('uint8')

tmpa=np.zeros((3,64,64)).astype('uint8')

# Shuffle the data
print('Shuffling training data')
for _ in range(0,n):
    i=rnd.randrange(n)
    j=rnd.randrange(n)
    tmpa=X_train[i]
    X_train[i]=X_train[j];
    X_train[j]=tmpa
    tmp=y_train[i][0]
    y_train[i][0]=y_train[j][0]
    y_train[j][0]=tmp

print 'Pickle dump'
pickle.dump(X_train,open('xtrain_shu_th.p','wb'))
pickle.dump(y_train,open('ytrain_shu_th.p','wb'))

没有任何帮助。我在第一次尝试时并不期望99%的准确率,但至少有一些动作然后是稳定状态。

我想尝试TFLearn,但是几天前我看到它有一个待定的错误。

有什么想法吗?提前致谢

1 个答案:

答案 0 :(得分:2)

您可以使用keras模型API(https://keras.io/models/model/#fit)的shuffle构建。只需将shuffle参数设置为true即可。您可以同时执行批量shuffle和全局shuffle。默认为全局随机播放。

有一点需要注意的是,拟合中的验证分割是在混洗发生之前完成的。因此,如果您想要重新调整验证数据,我建议您使用:sklearn.utils.shuffle。 (http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html

来自github:

if shuffle == 'batch':
    index_array = batch_shuffle(index_array, batch_size)              
elif shuffle:
    random.shuffle(index_array)