如何处理Tensorflow中的过度拟合?

时间:2016-08-10 14:49:10

标签: python-2.7 tensorflow

我目前正在尝试训练图像分类卷积神经网络。我使用的结构类似于TensorFlow教程中的结构。经过训练,我可以获得相当高的训练精度和非常低的交叉熵。但是测试的准确性总是比随机猜测高一点点。神经网络似乎遭受过度拟合。在训练过程中,我已经应用了随机梯度下降和重组以试图避免过度拟合。但它似乎并没有起作用。

这是我的代码的一部分。

batch_image = np.ndarray(shape=(100,9216), dtype='float')
batch_class = np.ndarray(shape=(100,10), dtype='float')
# first convolutinal layer
w_conv1 = weight_variable([5, 5, 3, 64])
b_conv1 = bias_variable([64])

x_image = tf.reshape(x, [-1, 48, 64, 3])

h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
norm1 = tf.nn.lrn(tf.to_float(h_pool1, name='ToFloat'), 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)

# second convolutional layer
w_conv2 = weight_variable([5, 5, 64, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(norm1, w_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
norm2 = tf.nn.lrn(tf.to_float(h_pool2, name='ToFloat'), 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)

# densely connected layer
w_fc1 = weight_variable([12*16*64, 512])
b_fc1 = bias_variable([512])

h_pool2_flat = tf.reshape(norm2, [-1, 12*16*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)

#densely connected layer
w_fc2 = weight_variable([512, 256])
b_fc2 = bias_variable([256])
h_fc2 = tf.nn.relu(tf.matmul(h_fc1, w_fc2) + b_fc2)

# dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc2, keep_prob)

# readout layer
w_fc3 = weight_variable([256, 10])
b_fc3 = bias_variable([10])

y_prob = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc3) + b_fc3)

# train and evaluate the model

cross_entropy = -tf.reduce_sum(y_ * tf.log(y_prob + 0.000000001))
train_step = tf.train.GradientDescentOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_prob, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())

for i in range(100):
  rand_idx = np.random.randint(17778, size=(100))
  k = 0
  for j in rand_idx:
    batch_image[k] = images[j]
    batch_class[k] = np.zeros(shape=(10))
    batch_class[k, classes[j, 0]] = 1.0
    k+=1

  train_step.run(feed_dict={x:batch_image, y_:batch_class, keep_prob:0.5})
  train_accuracy = accuracy.eval(feed_dict={x:batch_image, y_:batch_class, keep_prob:1.0})
  train_ce = cross_entropy.eval(feed_dict={x:batch_image, y_:batch_class, keep_prob:1.0})

我想知道我的代码中是否有任何错误,或者我是否必须应用任何其他策略来获得更好的测试准确度。

谢谢!

1 个答案:

答案 0 :(得分:0)

您可以尝试以下策略来避免过度拟合。

  1. 随机输入数据
  2. 对“损失”功能使用早期停止功能,并保持一定耐心。
  3. L1和L2正则化
  4. 添加辍学
  5. 批量归一化。
  6. 如果未对像素进行归一化,则将像素值除以255也有帮助。
  7. 执行图像数据参数化。
  8. 可能是超参数调整网格搜索。

希望有帮助!编码愉快。

谢谢!