张量流中的批量归一化问题

时间:2018-11-09 22:26:03

标签: tensorflow batch-normalization

我无法理解Tensorflow中批处理规范化的实现。为了说明,我创建了一个简单的网络,其中有一个输入节点,一个隐藏节点和一个输出节点,并以1个批处理运行,批处理大小为2。我的输入x由具有2个值(即批处理大小)的标量组成2),一个设置为0,另一个设置为1。

我跑了一个纪元,写出了隐藏层的输出(在批标准化之前和之后),以及批范数移动平均值,方差,伽玛和贝塔。

这是我的代码:

import tensorflow as tf

import numpy as np

N_HIDDEN_1 = 1
N_INPUT= 1
N_OUTPUT = 1

###########################################################

# DEFINE THE Network

# Define placeholders for data that will be fed in during execution
x = tf.placeholder(tf.float32, (None, N_INPUT))
y = tf.placeholder(tf.float32, (None, N_OUTPUT))
lx = tf.placeholder(tf.float32, [])
training = tf.placeholder_with_default(False, shape=(), name='training')

# Hidden layers with relu activation
with tf.variable_scope('hidden1'):
      hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
      bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
      bn_1x = tf.nn.relu(bn_1)

# Output layer
with tf.variable_scope('output'):
      predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
      pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)

###########################################################

# Define the cost function that is optimized when
# training the network and the optimizer

cost = tf.reduce_mean(tf.square(pred-y))

optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)

bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')

###########################################################

# Train network

init = tf.global_variables_initializer()
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

with tf.Session() as sess:

    sess.run(init)

    # Create dummy data
    batchx = np.zeros((2,1))
    batchy = np.zeros((2,1))
    batchx[0,0]=0.0
    batchx[1,0]=1.0
    batchy[0,0]=3.0
    batchy[1,0]=4.0

    _,_ = sess.run([optimizer, extra_update_ops], feed_dict={training: True, x:batchx, y:batchy, lx: 0.001})

    print('weight of hidden layer')
    W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
    W1x = np.sum(W1, axis=1)
    print(W1x)

    print()
    print('output from hidden layer, batch norm layer, and relu layer')
    hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict={training: False, x:batchx})
    print('hidden_1', hid1)
    print('bn_1', b1)
    print('bn_1x', b1x)

    print()
    print('batchnorm parameters')
    print('moving mean', sess.run(bout1))
    print('moving variance', sess.run(bout2))
    print('gamma', sess.run(bout3))
    print('beta', sess.run(bout4))

这是我运行代码时得到的输出:

weight of hidden layer [[1.404974]]

output from hidden layer, batch norm layer, and relu layer
hidden_1 [[0.      ]
          [1.404974]]

bn_1 [[-0.40697935]
      [ 1.215785  ]]

bn_1x [[0.      ]
      [1.215785]]

batchnorm parameters
moving mean [array([0.3514931], dtype=float32)]
moving variance [array([0.74709475], dtype=float32)]
gamma [array([0.999], dtype=float32)]
beta [array([-0.001], dtype=float32)]

我对生成的batchnorm参数感到困惑。在此特定情况下,应用批处理规范之前隐藏层的输出 标量为0和1.404974。但批量规范参数移动平均值为0.3514931。这是在我使用动量= 0.5的情况下。我不清楚在这种情况下为什么1次迭代后的移动平均值不完全是0和1.404974的平均值。我的印象是动量参数只会从第二批开始起作用。

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

因为您运行了优化程序,所以很难知道内部到底发生了什么:要打印的hidden_​​1值不是用于更新批处理规范统计信息的值;它们是更新后的值。

无论如何,我看不到这个问题:

File imFile = new File(imagePath);
ExifInterface ei = new ExifInterface(imFile.getAbsolutePath());