我正在尝试使用tf.layers.batch_normalization()进行批量规范化,我的代码如下所示:
def create_conv_exp_model(fingerprint_input, model_settings, is_training):
# Dropout placeholder
if is_training:
dropout_prob = tf.placeholder(tf.float32, name='dropout_prob')
# Mode placeholder
mode_placeholder = tf.placeholder(tf.bool, name="mode_placeholder")
he_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")
# Input Layer
input_frequency_size = model_settings['bins']
input_time_size = model_settings['spectrogram_length']
net = tf.reshape(fingerprint_input,
[-1, input_time_size, input_frequency_size, 1],
name="reshape")
net = tf.layers.batch_normalization(net,
training=mode_placeholder,
name='bn_0')
for i in range(1, 6):
net = tf.layers.conv2d(inputs=net,
filters=8*(2**i),
kernel_size=[5, 5],
padding='same',
kernel_initializer=he_init,
name="conv_%d"%i)
net = tf.layers.batch_normalization(net,
training=mode_placeholder,
name='bn_%d'%i)
with tf.name_scope("relu_%d"%i):
net = tf.nn.relu(net)
net = tf.layers.max_pooling2d(net, [2, 2], [2, 2], 'SAME',
name="maxpool_%d"%i)
net_shape = net.get_shape().as_list()
net_height = net_shape[1]
net_width = net_shape[2]
net = tf.layers.conv2d( inputs=net,
filters=1024,
kernel_size=[net_height, net_width],
strides=(net_height, net_width),
padding='same',
kernel_initializer=he_init,
name="conv_f")
net = tf.layers.batch_normalization( net,
training=mode_placeholder,
name='bn_f')
with tf.name_scope("relu_f"):
net = tf.nn.relu(net)
net = tf.layers.conv2d( inputs=net,
filters=model_settings['label_count'],
kernel_size=[1, 1],
padding='same',
kernel_initializer=he_init,
name="conv_l")
### Squeeze
squeezed = tf.squeeze(net, axis=[1, 2], name="squeezed")
if is_training:
return squeezed, dropout_prob, mode_placeholder
else:
return squeezed, mode_placeholder
我的火车步骤如下:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate_input)
gvs = optimizer.compute_gradients(cross_entropy_mean)
capped_gvs = [(tf.clip_by_value(grad, -2., 2.), var) for grad, var in gvs]
train_step = optimizer.apply_gradients(gvs))
在训练期间,我正在用以下方式提供图表:
train_summary, train_accuracy, cross_entropy_value, _, _ = sess.run(
[
merged_summaries, evaluation_step, cross_entropy_mean, train_step,
increment_global_step
],
feed_dict={
fingerprint_input: train_fingerprints,
ground_truth_input: train_ground_truth,
learning_rate_input: learning_rate_value,
dropout_prob: 0.5,
mode_placeholder: True
})
验证期间,
validation_summary, validation_accuracy, conf_matrix = sess.run(
[merged_summaries, evaluation_step, confusion_matrix],
feed_dict={
fingerprint_input: validation_fingerprints,
ground_truth_input: validation_ground_truth,
dropout_prob: 1.0,
mode_placeholder: False
})
我的损失和准确度曲线(橙色是训练,蓝色是验证): Plot of loss vs number of iterations, Plot of accuracy vs number of iterations
验证损失(和准确性)似乎非常不稳定。我的Batch Normalization实现错了吗?或者这是正常的批量标准化,我应该等待更多的迭代?
答案 0 :(得分:1)
您需要将is_training传递给tf.layers.batch_normalization(..., training=is_training)
,或者尝试使用minibatch统计信息而不是训练统计信息来规范化推理小批量,这是错误的。
答案 1 :(得分:0)
主要有两件事要检查。
1。您确定在火车操作中正确使用了批次归一化(BN)吗?
如果您阅读图层文档:
注意:训练时,需要更新moving_mean和moving_variance。 默认情况下,更新操作位于
tf.GraphKeys.UPDATE_OPS
中,因此它们 需要添加为对train_op
的依赖。另外,请务必添加 获取update_ops集合之前的所有batch_normalization ops。 否则,update_ops将为空,并且训练/推论将不起作用 正确地。
例如:
x_norm = tf.layers.batch_normalization(x, training=training)
# ...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
2。否则,请尝试降低BN中的“动量”。
实际上,在训练过程中,国阵使用了均值和方差的两个移动平均值,这些均值被认为可以近似人口统计数据。均值和方差分别初始化为0和1,然后逐步将它们乘以动量值(默认值为0.99)并添加新值* 0.01。在推断(测试)时,归一化使用这些统计信息。因此,要花些时间才能得出这些数据的“真实”均值和方差。
来源:
https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization
https://github.com/keras-team/keras/issues/7265
https://github.com/keras-team/keras/issues/3366
可在此处找到原始的BN论文:
答案 2 :(得分:0)
在ReLU之前添加批处理规范时,我还观察到验证损失的振荡。我们发现,在ReLU解决了该问题之后,就移动了批处理规范。