为什么tensorflow中的batch_normal化没有给出预期的结果?

时间:2017-12-22 08:29:44

标签: tensorflow neural-network conv-neural-network batch-normalization

我想在一个小例子中看到batch_normalization图层的输出,但显然我做错了所以我得到了与输入相同的输出。

import tensorflow as tf
import keras.backend as K
K.set_image_data_format('channels_last')

X = tf.placeholder(tf.float32,  shape=(None, 2, 2, 3))  #  samples are 2X2 images with 3 channels
outp =  tf.layers.batch_normalization(inputs=X,  axis=3)

x = np.random.rand(4, 2, 2, 3)  # sample set: 4 images

init_op = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    K.set_session(sess)
    a = sess.run(outp, feed_dict={X:x, K.learning_phase(): 0})
    print(a-x) # print the difference between input and normalized output

上述代码的输入和输出几乎相同。任何人都可以向我指出这个问题吗?

1 个答案:

答案 0 :(得分:2)

请记住,batch_normalization在火车和考试时间表现不同。在这里,您从未“训练”过批量归一化,因此它所学习的移动平均值是随机的但接近于0,并且移动方差因子接近1,因此输出几乎与输入相同。如果您使用K.learning_phase(): 1,您会看到一些差异(因为它会使用批次的平均值和标准差进行标准化);如果你第一次学习很多例子,然后测试其他一些例子,你也会看到正常化,因为学习的均值和标准差不会是0和1。

为了更好地了解批量规范的效果,我还建议你将输入乘以一个大数(比如100),这样你就可以在非标准化和标准化向量之间找到明显的区别,这将有助于你测试什么是继续

编辑:在您的代码中,似乎永远不会更新移动均值和移动方差。您需要确保运行更新操作,如batch_normalization's doc中所示。以下几行应该有效:

outp =  tf.layers.batch_normalization(inputs=X,  axis=3, training=is_training, center=False, scale=False)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    outp = tf.identity(outp)

以下是我的完整工作代码(我摆脱了Keras,因为我不太了解它,但你应该可以重新添加它。)

import tensorflow as tf
import numpy as np

X = tf.placeholder(tf.float32,  shape=(None, 2, 2, 3))  #  samples are 2X2 images with 3 channels
is_training = tf.placeholder(tf.bool,  shape=())  #  samples are 2X2 images with 3 channels
outp =  tf.layers.batch_normalization(inputs=X,  axis=3, training=is_training, center=False, scale=False)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    outp = tf.identity(outp)

x = np.random.rand(4, 2, 2, 3) * 100  # sample set: 4 images

init_op = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    initial = sess.run(outp, feed_dict={X:x, is_training: False})
    for i in range(10000):
        a = sess.run(outp, feed_dict={X:x, is_training: True})
        if (i % 1000 == 0):
            print("Step %i: " %i, a-x) # print the difference between input and normalized output

    final = sess.run(outp, feed_dict={X: x, is_training: False})
    print("initial: ", initial)
    print("final: ", final)
    assert not np.array_equal(initial, final)