我正在imagenet-2012数据集上训练CNN,但该模型保持过度拟合(评估错误率:top1:49%,top5:25%,训练错误率:top1:25%,top5:8%。受过训练GTX1080ti经过600k训练步骤(约5天))。该体系结构基于ZF-net,但增加了批量规范:
x_input = feed_key['input']
bn_training = tf.placeholder(dtype=tf.bool, shape=(), name='bn_training')
with tf.name_scope('ZF_conv1'):
w_conv1 = tf.get_variable(name='conv1_kernel', shape=[7, 7, 3, 96], dtype=tf.float32)
h_conv1 = tf.nn.conv2d(x_input, w_conv1, strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('ZF_bn1'):
bn1 = tf.layers.batch_normalization(h_conv1, training=bn_training)
with tf.name_scope('ZF_relu1'):
h_active1 = tf.nn.relu(bn1)
with tf.name_scope('ZF_pool1'):
h_pool1 = tf.nn.max_pool(h_active1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('ZF_conv2'):
w_conv2 = tf.get_variable(name='conv2_kernel', shape=[5, 5, 96, 256], dtype=tf.float32)
h_conv2 = tf.nn.conv2d(h_pool1, w_conv2, strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('ZF_bn2'):
bn2 = tf.layers.batch_normalization(h_conv2, training=bn_training)
with tf.name_scope('ZF_relu2'):
h_active2 = tf.nn.relu(bn2)
with tf.name_scope('ZF_pool2'):
h_pool2 = tf.nn.max_pool(h_active2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope('ZF_conv3'):
w_conv3 = tf.get_variable(name='conv3_kernel', shape=[3, 3, 256, 384], dtype=tf.float32)
h_conv3 = tf.nn.conv2d(h_pool2, w_conv3, strides=[1, 1, 1, 1], padding='SAME')
with tf.name_scope('ZF_bn3'):
bn3 = tf.layers.batch_normalization(h_conv3, training=bn_training)
with tf.name_scope('ZF_relu3'):
h_active3 = tf.nn.relu(bn3)
with tf.name_scope('ZF_conv4'):
w_conv4 = tf.get_variable(name='conv4_kernel', shape=[3, 3, 384, 384], dtype=tf.float32)
h_conv4 = tf.nn.conv2d(h_active3, w_conv4, strides=[1, 1, 1, 1], padding='SAME')
with tf.name_scope('ZF_bn4'):
bn4 = tf.layers.batch_normalization(h_conv4, training=bn_training)
with tf.name_scope('ZF_relu4'):
h_active3 = tf.nn.relu(bn4)
with tf.name_scope('ZF_conv5'):
w_conv5 = tf.get_variable(name='conv5_kernel', shape=[3, 3, 384, 256], dtype=tf.float32)
h_conv5 = tf.nn.conv2d(h_active3, w_conv5, strides=[1, 1, 1, 1], padding='SAME')
with tf.name_scope('ZF_bn5'):
bn5 = tf.layers.batch_normalization(h_conv5, training=bn_training)
with tf.name_scope('ZF_relu5'):
h_active5 = tf.nn.relu(bn5)
feed_key['bn_training'] = bn_training
接下来是两个FC层:
fc1 = tf.layers.dense(low_out_flat,
units=4096,
activation=tf.nn.relu,
name='zffc1')
keep_prob1 = tf.placeholder(tf.float32, name='keep_prob1')
dropout1 = tf.nn.dropout(fc1, keep_prob1, name='zfdrop1')
fc2 = tf.layers.dense(dropout1,
units=4096,
activation=tf.nn.relu,
name='zffc2')
keep_prob2 = tf.placeholder(tf.float32, name='keep_prob2')
dropout2 = tf.nn.dropout(fc2, keep_prob2, name='zfdrop2')
feed_key['keep_prob1'] = keep_prob1
feed_key['keep_prob2'] = keep_prob2
最后计算交叉熵:
gt_labels = tf.placeholder(dtype=tf.int64, shape=[None])
logits = tf.layers.dense(model.last_layer, units=1000, name='imagenet_logits')
with tf.name_scope('imagenet_cross_entropy'):
entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=gt_labels, logits=logits)
with tf.name_scope('imagenet_loss'):
loss = tf.reduce_mean(entropy)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = training_methods.optimizer.minimize(loss=loss, global_step=training_methods.global_step)
对于培训时的数据预处理,我按照文章:Visualizing and Understanding Convolutional Networks
每个RGB图像都经过预处理,方法是将最小尺寸调整为256,裁剪中心256x256区域,减去每像素平均值(在所有图像上),然后使用10个不同的子作物,大小为224x224(角点+中心带( out)水平翻转)。
(备注:我通过首先将其调整为224 * 224,而不是256 * 256来计算整个训练数据集中的图像平均值。所以我在将子图像裁剪为224 * 224之后减去图像均值。我认为它是不是问题) 测试时,我只是将图像大小调整为224 * 224(这是一个问题?)
优化器:Adam的初始学习率为0.001 epsilon 0.1,
辍学率设定为0.5。
最后,我使用tf.variance_scaling_initializer()
来初始化所有权重
ZF-net论文报告他们的结果是测试错误率:top1:36.7%top5:15.3%所以这是我的结果,但我无法找到错误的地方
答案 0 :(得分:0)
事情就是当我评估模型时我只是将图像调整为224 * 224,它与训练数据处理(裁剪图像到224 * 224)不同,因此验证集的概率分布有点不同于训练集(不同的歧管),最初我认为这不是什么大问题,分布的变化不会是戏剧性的。 在我修改之后,验证错误显着下降