过度拟合问题关于ZF-net

时间:2018-06-05 01:38:21

标签: python tensorflow machine-learning deep-learning

我正在imagenet-2012数据集上训练CNN,但该模型保持过度拟合(评估错误率:top1:49%,top5:25%,训练错误率:top1:25%,top5:8%。受过训练GTX1080ti经过600k训练步骤(约5天))。该体系结构基于ZF-net,但增加了批量规范:

     x_input = feed_key['input']
bn_training = tf.placeholder(dtype=tf.bool, shape=(), name='bn_training')
with tf.name_scope('ZF_conv1'):
    w_conv1 = tf.get_variable(name='conv1_kernel', shape=[7, 7, 3, 96], dtype=tf.float32)
    h_conv1 = tf.nn.conv2d(x_input, w_conv1, strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('ZF_bn1'):
    bn1 = tf.layers.batch_normalization(h_conv1, training=bn_training)
with tf.name_scope('ZF_relu1'):
    h_active1 = tf.nn.relu(bn1)

with tf.name_scope('ZF_pool1'):
    h_pool1 = tf.nn.max_pool(h_active1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')

with tf.name_scope('ZF_conv2'):
    w_conv2 = tf.get_variable(name='conv2_kernel', shape=[5, 5, 96, 256], dtype=tf.float32)
    h_conv2 = tf.nn.conv2d(h_pool1, w_conv2, strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('ZF_bn2'):
    bn2 = tf.layers.batch_normalization(h_conv2, training=bn_training)
with tf.name_scope('ZF_relu2'):
    h_active2 = tf.nn.relu(bn2)

with tf.name_scope('ZF_pool2'):
    h_pool2 = tf.nn.max_pool(h_active2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')

with tf.name_scope('ZF_conv3'):
    w_conv3 = tf.get_variable(name='conv3_kernel', shape=[3, 3, 256, 384], dtype=tf.float32)
    h_conv3 = tf.nn.conv2d(h_pool2, w_conv3, strides=[1, 1, 1, 1], padding='SAME')
with tf.name_scope('ZF_bn3'):
    bn3 = tf.layers.batch_normalization(h_conv3, training=bn_training)
with tf.name_scope('ZF_relu3'):
    h_active3 = tf.nn.relu(bn3)

with tf.name_scope('ZF_conv4'):
    w_conv4 = tf.get_variable(name='conv4_kernel', shape=[3, 3, 384, 384], dtype=tf.float32)
    h_conv4 = tf.nn.conv2d(h_active3, w_conv4, strides=[1, 1, 1, 1], padding='SAME')
with tf.name_scope('ZF_bn4'):
    bn4 = tf.layers.batch_normalization(h_conv4, training=bn_training)
with tf.name_scope('ZF_relu4'):
    h_active3 = tf.nn.relu(bn4)

with tf.name_scope('ZF_conv5'):
    w_conv5 = tf.get_variable(name='conv5_kernel', shape=[3, 3, 384, 256], dtype=tf.float32)
    h_conv5 = tf.nn.conv2d(h_active3, w_conv5, strides=[1, 1, 1, 1], padding='SAME')

with tf.name_scope('ZF_bn5'):
    bn5 = tf.layers.batch_normalization(h_conv5, training=bn_training)
with tf.name_scope('ZF_relu5'):
    h_active5 = tf.nn.relu(bn5)

feed_key['bn_training'] = bn_training

接下来是两个FC层:

fc1 = tf.layers.dense(low_out_flat,
                      units=4096,
                      activation=tf.nn.relu,
                      name='zffc1')

keep_prob1 = tf.placeholder(tf.float32, name='keep_prob1')
dropout1 = tf.nn.dropout(fc1, keep_prob1, name='zfdrop1')

fc2 = tf.layers.dense(dropout1,
                      units=4096,
                      activation=tf.nn.relu,
                      name='zffc2')

keep_prob2 = tf.placeholder(tf.float32, name='keep_prob2')
dropout2 = tf.nn.dropout(fc2, keep_prob2, name='zfdrop2')

feed_key['keep_prob1'] = keep_prob1
feed_key['keep_prob2'] = keep_prob2

最后计算交叉熵:

 gt_labels = tf.placeholder(dtype=tf.int64, shape=[None])
logits = tf.layers.dense(model.last_layer, units=1000, name='imagenet_logits')

with tf.name_scope('imagenet_cross_entropy'):
    entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=gt_labels, logits=logits)
with tf.name_scope('imagenet_loss'):
    loss = tf.reduce_mean(entropy)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = training_methods.optimizer.minimize(loss=loss, global_step=training_methods.global_step)

对于培训时的数据预处理,我按照文章:Visualizing and Understanding Convolutional Networks

  

每个RGB图像都经过预处理,方法是将最小尺寸调整为256,裁剪中心256x256区域,减去每像素平均值(在所有图像上),然后使用10个不同的子作物,大小为224x224(角点+中心带( out)水平翻转)。

(备注:我通过首先将其调整为224 * 224,而不是256 * 256来计算整个训练数据集中的图像平均值。所以我在将子图像裁剪为224 * 224之后减去图像均值。我认为它是不是问题) 测试时,我只是将图像大小调整为224 * 224(这是一个问题?)

优化器:Adam的初始学习率为0.001 epsilon 0.1, 辍学率设定为0.5。 最后,我使用tf.variance_scaling_initializer()来初始化所有权重

ZF-net论文报告他们的结果是测试错误率:top1:36.7%top5:15.3%所以这是我的结果,但我无法找到错误的地方

1 个答案:

答案 0 :(得分:0)

事情就是当我评估模型时我只是将图像调整为224 * 224,它与训练数据处理(裁剪图像到224 * 224)不同,因此验证集的概率分布有点不同于训练集(不同的歧管),最初我认为这不是什么大问题,分布的变化不会是戏剧性的。 在我修改之后,验证错误显着下降