我正在尝试开发用于图像分类的卷积神经网络。 目前,我正在根据约1000张猫和狗的图像对集合进行分类。 但是,我被困在培训过程中。
首先,我尝试开发自己的网络,自己对图像进行预处理和标记,使用Tensorflow使用不同的体系结构和超参数进行测试。 由于未获得良好的结果,我尝试使用keras创建类似的网络,以获得更好的结果。
在以下代码中,我为张量流网络创建了训练集和验证集:
def oneHot(img):
label = img.split('.')[-3]
if label == 'cat': return [1, 0]
elif label == 'dog': return [0, 1]
def loadData(img_dir):
global img_h
global img_w
data_set = []
for img in tqdm(os.listdir(img_dir)):
label = oneHot(img)
path = os.path.join(img_dir, img)
img = cv2.imread(path)
img = cv2.resize(img, (img_h, img_w))
data_set.append([np.array(img/255, dtype='float32'), np.array(label)])
shuffle(data_set)
return data_set
def divideSet(data_set, train_size):
len_train = int(len(data_set)*train_size)
train_set = data_set[:len_train]
valid_set = data_set[len_train:]
return train_set, valid_set
def separateArgLabel(data_set):
arg = np.array([i[0] for i in data_set])
label = np.array([i[1] for i in data_set])
return arg, label
train_set = loadData(train_dir)
train_data, valid_data = divideSet(train_set, 0.8)
x_train, y_train = separateArgLabel(train_data)
x_valid, y_valid = separateArgLabel(valid_data)
还有我用来在tensorflow中构建和训练模型的代码:
def flattenLayer(x):
layer_shape = x.get_shape()
n_input = layer_shape[1:4].num_elements()
flat_layer = tf.reshape(x,[-1,n_input])
return flat_layer
def getRandomBatch(x, y, size):
rnd_idx = np.random.choice(len(x), size)
x_batch = x[rnd_idx]
y_batch = y[rnd_idx]
return x_batch, y_batch
with tf.Session() as sess:
x = tf.placeholder(tf.float32, shape=[None,img_w,img_h,img_c])
y = tf.placeholder(tf.float32, shape=[None,2])
conv1 = tf.layers.conv2d(x, 32, [5,5], strides=1, padding='same',
activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(conv1, pool_size=[2,2], strides=2)
conv2 = tf.layers.conv2d(pool1, 64, [5,5], strides=1, padding='same',
activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(conv2, pool_size=[2,2], strides=2)
conv3 = tf.layers.conv2d(pool2, 128, [5,5], strides=1, padding='same',
activation=tf.nn.relu)
pool3 = tf.layers.max_pooling2d(conv3, pool_size=[2,2], strides=2)
conv4 = tf.layers.conv2d(pool3, 64, [5,5], strides=1, padding='same',
activation=tf.nn.relu)
pool4 = tf.layers.max_pooling2d(conv4, pool_size=[2,2], strides=2)
conv5 = tf.layers.conv2d(pool4, 32, [5,5], strides=1, padding='same',
activation=tf.nn.relu)
pool5 = tf.layers.max_pooling2d(conv5, pool_size=[2,2], strides=2)
flatten = flattenLayer(pool5)
fc1 = tf.layers.dense(flatten, 1024, activation=tf.nn.relu)
logits = tf.layers.dense(fc1, 2, activation=tf.nn.relu)
y_pred = tf.nn.softmax(logits)
cross_entropy = losses.categorical_crossentropy(y, y_pred)
loss = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(0.0005)
grads = optimizer.compute_gradients(loss)
train = optimizer.apply_gradients(grads)
y_cls = tf.arg_max(y, 1)
y_pred_cls = tf.arg_max(y_pred, 1)
correct = tf.equal(y_pred_cls, y_cls)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
init = tf.global_variables_initializer()
sess.run(init)
for epoch in range(10):
sum_loss_train = 0
sum_acc_train = 0
for i in range(100):
batch_x, batch_y = getRandomBatch(x_train, y_train, 32)
feed_dict_train = {x:batch_x, y:batch_y}
_,loss_train,acc_train = sess.run([train,loss,accuracy],
feed_dict=feed_dict_train)
sum_loss_train += loss_train
sum_acc_train += acc_train
sys.stdout.write('\r'+str(i+1)+'/'+str(100)+'\t'+'loss: '+
str(sum_loss_train/(i+1))+' accuracy: '+str(acc_train))
sys.stdout.flush()
mean_loss_train = sum_loss_train/(i+1)
mean_acc_train = sum_acc_train/(i+1)
print("\nÉpoca: " + str(epoch+1) + " ===========> Epoch loss: " + "
{:.4f}".format(mean_loss_train))
print("\tEpoch accuracy: " + "{:.2f} %".format(mean_acc_train*100))
sum_loss_val = 0
sum_acc_val = 0
for j in range(50):
batch_x_val, batch_y_val = getRandomBatch(x_valid, y_valid, 32)
feed_dict_valid = {x:batch_x_val, y:batch_y_val}
loss_val,acc_val = sess.run([loss,accuracy],
feed_dict=feed_dict_valid)
sum_acc_val += acc_val
sum_loss_val += loss_val
mean_acc_val = sum_acc_val/(j+1)
mean_loss_val = sum_loss_val/(j+1)
print("\nValidation loss: " + "{:.4f}".format(mean_loss_val))
print("\tValidation accuracy: " + "{:.2f} %".format(mean_acc_val*100))
当我运行模型时,经过一些迭代,梯度始终变为零,并且损失被固定在一个恒定值中。 起初我以为网络由于缺少图像而停止学习,但是当我尝试使用Keras内置的网络训练同一数据集时,结果是相当不错的。 在两种情况下,我使用相同数量的图层,相同的hiperparameters并以相同的方式处理图像。尽管weigth的初始化可能有所不同,但结果使我认为我添加的代码中存在一些错误。 有人可以帮我解决这个问题吗?