Tensorflow:使用批量标准化可以提供较差(不稳定)的验证损失和准确性

时间:2017-12-30 06:36:57

标签: tensorflow machine-learning deep-learning conv-neural-network batch-normalization

我正在尝试使用tf.layers.batch_normalization()进行批量规范化,我的代码如下所示:

def create_conv_exp_model(fingerprint_input, model_settings, is_training):


  # Dropout placeholder
  if is_training:
    dropout_prob = tf.placeholder(tf.float32, name='dropout_prob')

  # Mode placeholder
  mode_placeholder = tf.placeholder(tf.bool, name="mode_placeholder")

  he_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")

  # Input Layer
  input_frequency_size = model_settings['bins']
  input_time_size = model_settings['spectrogram_length']
  net = tf.reshape(fingerprint_input,
                   [-1, input_time_size, input_frequency_size, 1],
                   name="reshape")
  net = tf.layers.batch_normalization(net, 
                                      training=mode_placeholder,
                                      name='bn_0')

  for i in range(1, 6):
    net = tf.layers.conv2d(inputs=net,
                           filters=8*(2**i),
                           kernel_size=[5, 5],
                           padding='same',
                           kernel_initializer=he_init,
                           name="conv_%d"%i)
    net = tf.layers.batch_normalization(net,
                                        training=mode_placeholder,
                                        name='bn_%d'%i)
    with tf.name_scope("relu_%d"%i):
      net = tf.nn.relu(net)
    net = tf.layers.max_pooling2d(net, [2, 2], [2, 2], 'SAME', 
                                  name="maxpool_%d"%i)

  net_shape = net.get_shape().as_list()
  net_height = net_shape[1]
  net_width = net_shape[2]
  net = tf.layers.conv2d( inputs=net,
                          filters=1024,
                          kernel_size=[net_height, net_width],
                          strides=(net_height, net_width),
                          padding='same',
                          kernel_initializer=he_init,
                          name="conv_f")
  net = tf.layers.batch_normalization( net, 
                                        training=mode_placeholder,
                                        name='bn_f')
  with tf.name_scope("relu_f"):
    net = tf.nn.relu(net)

  net = tf.layers.conv2d( inputs=net,
                          filters=model_settings['label_count'],
                          kernel_size=[1, 1],
                          padding='same',
                          kernel_initializer=he_init,
                          name="conv_l")

  ### Squeeze
  squeezed = tf.squeeze(net, axis=[1, 2], name="squeezed")

  if is_training:
    return squeezed, dropout_prob, mode_placeholder
  else:
    return squeezed, mode_placeholder

我的火车步骤如下:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
  optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate_input)
  gvs = optimizer.compute_gradients(cross_entropy_mean)
  capped_gvs = [(tf.clip_by_value(grad, -2., 2.), var) for grad, var in gvs]
  train_step = optimizer.apply_gradients(gvs))

在训练期间,我正在用以下方式提供图表:

train_summary, train_accuracy, cross_entropy_value, _, _ = sess.run(
    [
        merged_summaries, evaluation_step, cross_entropy_mean, train_step,
        increment_global_step
    ],
    feed_dict={
        fingerprint_input: train_fingerprints,
        ground_truth_input: train_ground_truth,
        learning_rate_input: learning_rate_value,
        dropout_prob: 0.5,
        mode_placeholder: True
    })

验证期间,

validation_summary, validation_accuracy, conf_matrix = sess.run(
                [merged_summaries, evaluation_step, confusion_matrix],
                feed_dict={
                    fingerprint_input: validation_fingerprints,
                    ground_truth_input: validation_ground_truth,
                    dropout_prob: 1.0,
                    mode_placeholder: False
                })

我的损失和准确度曲线(橙色是训练,蓝色是验证): Plot of loss vs number of iterationsPlot of accuracy vs number of iterations

验证损失(和准确性)似乎非常不稳定。我的Batch Normalization实现错了吗?或者这是正常的批量标准化,我应该等待更多的迭代?

3 个答案:

答案 0 :(得分:1)

您需要将is_training传递给tf.layers.batch_normalization(..., training=is_training),或者尝试使用minibatch统计信息而不是训练统计信息来规范化推理小批量,这是错误的。

答案 1 :(得分:0)

主要有两件事要检查。

1。您确定在火车操作中正确使用了批次归一化(BN)吗?

如果您阅读图层文档:

  

注意:训练时,需要更新moving_mean和moving_variance。     默认情况下,更新操作位于tf.GraphKeys.UPDATE_OPS中,因此它们     需要添加为对train_op的依赖。另外,请务必添加     获取update_ops集合之前的所有batch_normalization ops。     否则,update_ops将为空,并且训练/推论将不起作用     正确地。

例如:

x_norm = tf.layers.batch_normalization(x, training=training)

# ...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
     train_op = optimizer.minimize(loss)

2。否则,请尝试降低BN中的“动量”。

实际上,在训练过程中,国阵使用了均值和方差的两个移动平均值,这些均值被认为可以近似人口统计数据。均值和方差分别初始化为0和1,然后逐步将它们乘以动量值(默认值为0.99)并添加新值* 0.01。在推断(测试)时,归一化使用这些统计信息。因此,要花些时间才能得出这些数据的“真实”均值和方差。

来源:

https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization

https://github.com/keras-team/keras/issues/7265

https://github.com/keras-team/keras/issues/3366

可在此处找到原始的BN论文:

https://arxiv.org/abs/1502.03167

答案 2 :(得分:0)

在ReLU之前添加批处理规范时,我还观察到验证损失的振荡。我们发现,在ReLU解决了该问题之后,就移动了批处理规范。