在Tensorflow中训练CNN时,经过一些训练步骤后,损失值变得恒定

时间:2019-01-19 10:33:47

标签: python tensorflow keras

我正在尝试开发用于图像分类的卷积神经网络。 目前,我正在根据约1000张猫和狗的图像对集合进行分类。 但是,我被困在培训过程中。

首先,我尝试开发自己的网络,自己对图像进行预处理和标记,使用Tensorflow使用不同的体系结构和超参数进行测试。 由于未获得良好的结果,我尝试使用keras创建类似的网络,以获得更好的结果。

在以下代码中,我为张量流网络创建了训练集和验证集:

def oneHot(img):
  label = img.split('.')[-3]
  if label == 'cat': return [1, 0]
  elif label == 'dog': return [0, 1]
def loadData(img_dir):
  global img_h
  global img_w
  data_set = []
  for img in tqdm(os.listdir(img_dir)):
    label = oneHot(img)
    path = os.path.join(img_dir, img)
    img = cv2.imread(path)
    img = cv2.resize(img, (img_h, img_w))
    data_set.append([np.array(img/255, dtype='float32'), np.array(label)])
    shuffle(data_set)
  return data_set
def divideSet(data_set, train_size):
  len_train = int(len(data_set)*train_size)
  train_set = data_set[:len_train]
  valid_set = data_set[len_train:]
  return train_set, valid_set  
def separateArgLabel(data_set):
  arg = np.array([i[0] for i in data_set])
  label = np.array([i[1] for i in data_set])
  return arg, label
train_set = loadData(train_dir)
train_data, valid_data = divideSet(train_set, 0.8)
x_train, y_train = separateArgLabel(train_data)
x_valid, y_valid = separateArgLabel(valid_data)

还有我用来在tensorflow中构建和训练模型的代码:

def flattenLayer(x):
  layer_shape = x.get_shape()
  n_input = layer_shape[1:4].num_elements()
  flat_layer = tf.reshape(x,[-1,n_input])
  return flat_layer
def getRandomBatch(x, y, size):
  rnd_idx = np.random.choice(len(x), size)
  x_batch = x[rnd_idx]
  y_batch = y[rnd_idx]   
  return x_batch, y_batch

with tf.Session() as sess:
  x = tf.placeholder(tf.float32, shape=[None,img_w,img_h,img_c])
  y = tf.placeholder(tf.float32, shape=[None,2])
  conv1 = tf.layers.conv2d(x, 32, [5,5], strides=1, padding='same', 
  activation=tf.nn.relu)
  pool1 = tf.layers.max_pooling2d(conv1, pool_size=[2,2], strides=2)
  conv2 = tf.layers.conv2d(pool1, 64, [5,5], strides=1, padding='same', 
  activation=tf.nn.relu)
  pool2 = tf.layers.max_pooling2d(conv2, pool_size=[2,2], strides=2)
  conv3 = tf.layers.conv2d(pool2, 128, [5,5], strides=1, padding='same', 
  activation=tf.nn.relu)
  pool3 = tf.layers.max_pooling2d(conv3, pool_size=[2,2], strides=2)
  conv4 = tf.layers.conv2d(pool3, 64, [5,5], strides=1, padding='same', 
  activation=tf.nn.relu)
  pool4 = tf.layers.max_pooling2d(conv4, pool_size=[2,2], strides=2)
  conv5 = tf.layers.conv2d(pool4, 32, [5,5], strides=1, padding='same', 
  activation=tf.nn.relu)
  pool5 = tf.layers.max_pooling2d(conv5, pool_size=[2,2], strides=2)
  flatten = flattenLayer(pool5)
  fc1 = tf.layers.dense(flatten, 1024, activation=tf.nn.relu)
  logits = tf.layers.dense(fc1, 2, activation=tf.nn.relu)
  y_pred = tf.nn.softmax(logits)

  cross_entropy = losses.categorical_crossentropy(y, y_pred)
  loss = tf.reduce_mean(cross_entropy)
  optimizer = tf.train.AdamOptimizer(0.0005)
  grads = optimizer.compute_gradients(loss)
  train = optimizer.apply_gradients(grads)
  y_cls = tf.arg_max(y, 1)
  y_pred_cls = tf.arg_max(y_pred, 1)
  correct = tf.equal(y_pred_cls, y_cls)
  accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

  init = tf.global_variables_initializer()
  sess.run(init)
  for epoch in range(10):
    sum_loss_train = 0
    sum_acc_train = 0
    for i in range(100):
      batch_x, batch_y = getRandomBatch(x_train, y_train, 32)
      feed_dict_train = {x:batch_x, y:batch_y}
      _,loss_train,acc_train = sess.run([train,loss,accuracy], 
      feed_dict=feed_dict_train)
      sum_loss_train += loss_train
      sum_acc_train += acc_train

      sys.stdout.write('\r'+str(i+1)+'/'+str(100)+'\t'+'loss: '+
                   str(sum_loss_train/(i+1))+'  accuracy: '+str(acc_train))
      sys.stdout.flush()

    mean_loss_train = sum_loss_train/(i+1)
    mean_acc_train = sum_acc_train/(i+1)

    print("\nÉpoca: " + str(epoch+1) +  " ===========> Epoch loss: " + " 
    {:.4f}".format(mean_loss_train))
    print("\tEpoch accuracy: " + "{:.2f} %".format(mean_acc_train*100))

    sum_loss_val = 0
    sum_acc_val = 0
    for j in range(50):
      batch_x_val, batch_y_val = getRandomBatch(x_valid, y_valid, 32)
      feed_dict_valid = {x:batch_x_val, y:batch_y_val}
      loss_val,acc_val = sess.run([loss,accuracy], 
      feed_dict=feed_dict_valid)
      sum_acc_val += acc_val
      sum_loss_val += loss_val
    mean_acc_val = sum_acc_val/(j+1)
    mean_loss_val = sum_loss_val/(j+1)
    print("\nValidation loss: " + "{:.4f}".format(mean_loss_val))
    print("\tValidation accuracy: " + "{:.2f} %".format(mean_acc_val*100))

当我运行模型时,经过一些迭代,梯度始终变为零,并且损失被固定在一个恒定值中。 起初我以为网络由于缺少图像而停止学习,但是当我尝试使用Keras内置的网络训练同一数据集时,结果是相当不错的。 在两种情况下,我使用相同数量的图层,相同的hiperparameters并以相同的方式处理图像。尽管weigth的初始化可能有所不同,但结果使我认为我添加的代码中存在一些错误。 有人可以帮我解决这个问题吗?

0 个答案:

没有答案