Question

我正在imagenet-2012数据集上训练CNN，但该模型保持过度拟合（评估错误率：top1：49％，top5：25％，训练错误率：top1：25％，top5：8％。受过训练GTX1080ti经过600k训练步骤（约5天））。该体系结构基于ZF-net，但增加了批量规范：

     x_input = feed_key['input']
bn_training = tf.placeholder(dtype=tf.bool, shape=(), name='bn_training')
with tf.name_scope('ZF_conv1'):
    w_conv1 = tf.get_variable(name='conv1_kernel', shape=[7, 7, 3, 96], dtype=tf.float32)
    h_conv1 = tf.nn.conv2d(x_input, w_conv1, strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('ZF_bn1'):
    bn1 = tf.layers.batch_normalization(h_conv1, training=bn_training)
with tf.name_scope('ZF_relu1'):
    h_active1 = tf.nn.relu(bn1)

with tf.name_scope('ZF_pool1'):
    h_pool1 = tf.nn.max_pool(h_active1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')

with tf.name_scope('ZF_conv2'):
    w_conv2 = tf.get_variable(name='conv2_kernel', shape=[5, 5, 96, 256], dtype=tf.float32)
    h_conv2 = tf.nn.conv2d(h_pool1, w_conv2, strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('ZF_bn2'):
    bn2 = tf.layers.batch_normalization(h_conv2, training=bn_training)
with tf.name_scope('ZF_relu2'):
    h_active2 = tf.nn.relu(bn2)

with tf.name_scope('ZF_pool2'):
    h_pool2 = tf.nn.max_pool(h_active2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')

with tf.name_scope('ZF_conv3'):
    w_conv3 = tf.get_variable(name='conv3_kernel', shape=[3, 3, 256, 384], dtype=tf.float32)
    h_conv3 = tf.nn.conv2d(h_pool2, w_conv3, strides=[1, 1, 1, 1], padding='SAME')
with tf.name_scope('ZF_bn3'):
    bn3 = tf.layers.batch_normalization(h_conv3, training=bn_training)
with tf.name_scope('ZF_relu3'):
    h_active3 = tf.nn.relu(bn3)

with tf.name_scope('ZF_conv4'):
    w_conv4 = tf.get_variable(name='conv4_kernel', shape=[3, 3, 384, 384], dtype=tf.float32)
    h_conv4 = tf.nn.conv2d(h_active3, w_conv4, strides=[1, 1, 1, 1], padding='SAME')
with tf.name_scope('ZF_bn4'):
    bn4 = tf.layers.batch_normalization(h_conv4, training=bn_training)
with tf.name_scope('ZF_relu4'):
    h_active3 = tf.nn.relu(bn4)

with tf.name_scope('ZF_conv5'):
    w_conv5 = tf.get_variable(name='conv5_kernel', shape=[3, 3, 384, 256], dtype=tf.float32)
    h_conv5 = tf.nn.conv2d(h_active3, w_conv5, strides=[1, 1, 1, 1], padding='SAME')

with tf.name_scope('ZF_bn5'):
    bn5 = tf.layers.batch_normalization(h_conv5, training=bn_training)
with tf.name_scope('ZF_relu5'):
    h_active5 = tf.nn.relu(bn5)

feed_key['bn_training'] = bn_training

接下来是两个FC层：

fc1 = tf.layers.dense(low_out_flat,
                      units=4096,
                      activation=tf.nn.relu,
                      name='zffc1')

keep_prob1 = tf.placeholder(tf.float32, name='keep_prob1')
dropout1 = tf.nn.dropout(fc1, keep_prob1, name='zfdrop1')

fc2 = tf.layers.dense(dropout1,
                      units=4096,
                      activation=tf.nn.relu,
                      name='zffc2')

keep_prob2 = tf.placeholder(tf.float32, name='keep_prob2')
dropout2 = tf.nn.dropout(fc2, keep_prob2, name='zfdrop2')

feed_key['keep_prob1'] = keep_prob1
feed_key['keep_prob2'] = keep_prob2

最后计算交叉熵：

 gt_labels = tf.placeholder(dtype=tf.int64, shape=[None])
logits = tf.layers.dense(model.last_layer, units=1000, name='imagenet_logits')

with tf.name_scope('imagenet_cross_entropy'):
    entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=gt_labels, logits=logits)
with tf.name_scope('imagenet_loss'):
    loss = tf.reduce_mean(entropy)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = training_methods.optimizer.minimize(loss=loss, global_step=training_methods.global_step)

对于培训时的数据预处理，我按照文章：Visualizing and Understanding Convolutional Networks

每个RGB图像都经过预处理，方法是将最小尺寸调整为256，裁剪中心256x256区域，减去每像素平均值（在所有图像上），然后使用10个不同的子作物，大小为224x224（角点+中心带（ out）水平翻转）。

（备注：我通过首先将其调整为224 * 224，而不是256 * 256来计算整个训练数据集中的图像平均值。所以我在将子图像裁剪为224 * 224之后减去图像均值。我认为它是不是问题）测试时，我只是将图像大小调整为224 * 224（这是一个问题？）

优化器：Adam的初始学习率为0.001 epsilon 0.1，辍学率设定为0.5。最后，我使用tf.variance_scaling_initializer()来初始化所有权重

ZF-net论文报告他们的结果是测试错误率：top1：36.7％top5：15.3％所以这是我的结果，但我无法找到错误的地方

Answer 1

事情就是当我评估模型时我只是将图像调整为224 * 224，它与训练数据处理（裁剪图像到224 * 224）不同，因此验证集的概率分布有点不同于训练集（不同的歧管），最初我认为这不是什么大问题，分布的变化不会是戏剧性的。在我修改之后，验证错误显着下降

过度拟合问题关于ZF-net

1 个答案: