我不确定是否只有我认为张量流文档有点弱。
我计划使用tf.nn.batch_normalization函数来实现批量标准化,但后来认识到了tf.layers.batch_normalization函数,它看起来应该是一个简单的函数。但如果我说的话,文档真的很差。
我试图了解如何正确地使用它但是使用网页上提供的信息真的不容易。我希望也许其他人有经验并帮助我(可能还有很多人)去理解它。
让我先分享一下界面:
tf.layers.batch_normalization(
inputs,
axis=-1,
momentum=0.99,
epsilon=0.001,
center=True,
scale=True,
beta_initializer=tf.zeros_initializer(),
gamma_initializer=tf.ones_initializer(),
moving_mean_initializer=tf.zeros_initializer(),
moving_variance_initializer=tf.ones_initializer(),
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
training=False,
trainable=True,
name=None,
reuse=None,
renorm=False,
renorm_clipping=None,
renorm_momentum=0.99,
fused=None,
virtual_batch_size=None,
adjustment=None
)
Q1)beta值初始化为零,gamma值初始化为1.但它没有说明原因。当使用批量标准化时,我理解神经网络的普通偏差参数变得过时,并且批量标准化步骤中的β参数做同样的事情。从这个角度来看,将beta设置为零是可以理解的。但为什么伽马值初始化为1?这真的是最有效的方式吗?
Q2)我也看到了一个动量参数。文档只是说"移动平均线的动量。"。我假设在计算"均值"时使用此参数。相应隐藏层中某个小批量的值。换句话说,批量标准化中使用的平均值不是当前小批量的平均值,而是主要是最后100个小批量的平均值(因为动量= 0.99)。但目前还不清楚这个参数如何影响测试中的执行,或者我是否只是通过计算成本和准确度来验证我的模型。我的假设是我随时处理测试和开发设置,我设置参数" training"为假,因此动量参数对于该特定执行而言已经过时,并且"意味着"和"方差"现在使用在训练期间计算的值,而不是计算新的均值和方差值。如果我弄错的话应该如何,但如果是这样的话我在文件中没有看到任何内容。谁能证实我的理解是正确的?如果没有,我真的希望对此有进一步的解释。
Q3)我很难给出可训练参数的含义。我认为beta和gamma params在这里意味着。他们为什么不能训练?
Q4)"重用"参数。真的是什么?
Q5)调整参数。另一个错误..
Q5)一种总结性问题..这是我需要确认和反馈的总体假设。这里的重要参数是: - 投入 - 轴 - 势头 - 中央 - 规模 - 培训 而且我认为只要训练时训练=真,我们就是安全的。只要在验证开发设置或测试集时甚至在现实生活中使用模型时训练= False,我们也是安全的。
任何反馈都会非常感激。
附录:
混乱仍在继续。救命啊!
我正在尝试使用此功能,而不是手动实现批量规范化器。我有以下前向传播函数,它遍历NN的各层。
def forward_propagation_with_relu(X, num_units_in_layers, parameters,
normalize_batch, training, mb_size=7):
L = len(num_units_in_layers)
A_temp = tf.transpose(X)
for i in range (1, L):
W = parameters.get("W"+str(i))
b = parameters.get("b"+str(i))
Z_temp = tf.add(tf.matmul(W, A_temp), b)
if normalize_batch:
if (i < (L-1)):
with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1,
training=training)
A_temp = tf.nn.relu(Z_temp)
return Z_temp #This is the linear output of last layer
tf.layers.batch_normalization(..)函数想拥有静态尺寸,但在我的情况下我没有它。
由于我在运行优化器之前每次应用迷你批次而不是训练整个列车集,因此X的1维似乎是未知的。
如果我写:
print(X.shape)
我明白了:
(?, 5)
在这种情况下,当我运行整个程序时,我会在下面收到以下错误。
我在其他一些帖子中看到有些人说他们可以通过使用tf.reshape函数解决问题。我尝试了..前进道具很好,但后来它在Adam Optimizer中崩溃了..
以下是我在运行上述代码时所获得的结果(不使用tf.reshape):
我该如何解决这个问题?
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-191-990fb7d7f7f6> in <module>()
24 parameters = nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs,
25 normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers,
---> 26 lambd, print_progress)
27
28 print(parameters)
<ipython-input-190-59594e979129> in nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers, lambd, print_progress)
34 # Forward propagation: Build the forward propagation in the tensorflow graph
35 ZL = forward_propagation_with_relu(X_mini_batch, num_units_in_layers,
---> 36 parameters, normalize_batch, training)
37
38 with tf.name_scope("calc_cost"):
<ipython-input-187-8012e2fb6236> in forward_propagation_with_relu(X, num_units_in_layers, parameters, normalize_batch, training, mb_size)
15 with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
16 Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1,
---> 17 training=training)
18
19 A_temp = tf.nn.relu(Z_temp)
~/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py in batch_normalization(inputs, axis, momentum, epsilon, center, scale, beta_initializer, gamma_initializer, moving_mean_initializer, moving_variance_initializer, beta_regularizer, gamma_regularizer, beta_constraint, gamma_constraint, training, trainable, name, reuse, renorm, renorm_clipping, renorm_momentum, fused, virtual_batch_size, adjustment)
775 _reuse=reuse,
776 _scope=name)
--> 777 return layer.apply(inputs, training=training)
778
779
~/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py in apply(self, inputs, *args, **kwargs)
805 Output tensor(s).
806 """
--> 807 return self.__call__(inputs, *args, **kwargs)
808
809 def _add_inbound_node(self,
~/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py in __call__(self, inputs, *args, **kwargs)
676 self._defer_regularizers = True
677 with ops.init_scope():
--> 678 self.build(input_shapes)
679 # Create any regularizers added by `build`.
680 self._maybe_create_variable_regularizers()
~/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py in build(self, input_shape)
251 if axis_to_dim[x] is None:
252 raise ValueError('Input has undefined `axis` dimension. Input shape: ',
--> 253 input_shape)
254 self.input_spec = base.InputSpec(ndim=ndims, axes=axis_to_dim)
255
ValueError: ('Input has undefined `axis` dimension. Input shape: ', TensorShape([Dimension(6), Dimension(None)]))
这是无望的......
附录(2)
我正在添加更多信息:
以下简单表示输入层有5个单元,每个隐藏层有6个单元,输出层有2个单元。
num_units_in_layers = [5,6,6,2]
以下是使用tf.reshape
的前向道具功能的更新版本def forward_propagation_with_relu(X, num_units_in_layers, parameters,
normalize_batch, training, mb_size=7):
L = len(num_units_in_layers)
print("X.shape before reshape: ", X.shape) # ADDED LINE 1
X = tf.reshape(X, [mb_size, num_units_in_layers[0]]) # ADDED LINE 2
print("X.shape after reshape: ", X.shape) # ADDED LINE 3
A_temp = tf.transpose(X)
for i in range (1, L):
W = parameters.get("W"+str(i))
b = parameters.get("b"+str(i))
Z_temp = tf.add(tf.matmul(W, A_temp), b)
if normalize_batch:
if (i < (L-1)):
with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1,
training=training)
A_temp = tf.nn.relu(Z_temp)
return Z_temp #This is the linear output of last layer
当我这样做时,我可以运行前向道具功能。但它似乎在以后的执行中崩溃了。这是我得到的错误。 (请注意,我在前向道具功能中重塑前后打印输入X的形状。)
X.shape before reshape: (?, 5)
X.shape after reshape: (7, 5)
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1349 try:
-> 1350 return fn(*args)
1351 except errors.OpError as e:
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1328 feed_dict, fetch_list, target_list,
-> 1329 status, run_metadata)
1330
~/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
515 compat.as_text(c_api.TF_Message(self.status.status)),
--> 516 c_api.TF_GetCode(self.status.status))
517 # Delete the underlying status object from memory otherwise it stays alive
InvalidArgumentError: Incompatible shapes: [7] vs. [2]
[[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-222-990fb7d7f7f6> in <module>()
24 parameters = nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs,
25 normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers,
---> 26 lambd, print_progress)
27
28 print(parameters)
<ipython-input-221-59594e979129> in nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers, lambd, print_progress)
88 cost_mini_batch,
89 accuracy_mini_batch],
---> 90 feed_dict={training: True})
91 nr_of_minibatches += 1
92 sum_minibatch_costs += minibatch_cost
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
--> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1126 if final_fetches or final_targets or (handle and feed_dict_tensor):
1127 results = self._do_run(handle, final_targets, final_fetches,
-> 1128 feed_dict_tensor, options, run_metadata)
1129 else:
1130 results = []
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1342 if handle is None:
1343 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1344 options, run_metadata)
1345 else:
1346 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1361 except KeyError:
1362 pass
-> 1363 raise type(e)(node_def, op, message)
1364
1365 def _extend_graph(self):
InvalidArgumentError: Incompatible shapes: [7] vs. [2]
[[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]
Caused by op 'forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub', defined at:
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 478, in start
self.io_loop.start()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-222-990fb7d7f7f6>", line 26, in <module>
lambd, print_progress)
File "<ipython-input-221-59594e979129>", line 36, in nn_model
parameters, normalize_batch, training)
File "<ipython-input-218-62e4c6126c2c>", line 19, in forward_propagation_with_relu
training=training)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 777, in batch_normalization
return layer.apply(inputs, training=training)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 807, in apply
return self.__call__(inputs, *args, **kwargs)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 697, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 602, in call
lambda: self.moving_mean)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/utils.py", line 211, in smart_cond
return control_flow_ops.cond(pred, true_fn=fn1, false_fn=fn2, name=name)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
return func(*args, **kwargs)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1985, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1839, in BuildCondBranch
original_result = fn()
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 601, in <lambda>
lambda: _do_update(self.moving_mean, new_mean),
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 597, in _do_update
var, value, self.momentum, zero_debias=False)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/training/moving_averages.py", line 87, in assign_moving_average
update_delta = (variable - value) * decay
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 778, in _run_op
return getattr(ops.Tensor, operator)(a._AsTensor(), *args)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 934, in binary_op_wrapper
return func(x, y, name=name)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4819, in _sub
"Sub", x=x, y=y, name=name)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3267, in create_op
op_def=op_def)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Incompatible shapes: [7] vs. [2]
[[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]
关于为什么X的形状不是静态的问题..我不知道...... 我是如何设置数据集的。
with tf.name_scope("next_train_batch"):
filenames = tf.placeholder(tf.string, shape=[None])
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(lambda filename: tf.data.TextLineDataset(filename).skip(1).map(decode_csv))
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(minibatch_size)
iterator = dataset.make_initializable_iterator()
X_mini_batch, Y_mini_batch = iterator.get_next()
我有2个包含火车数据的csv文件。
train_path1 = "train1.csv"
train_path2 = "train2.csv"
train_input_paths = [train_path1, train_path2]
我使用可初始化的迭代器如下:
sess.run(iterator.initializer,
feed_dict={filenames: train_input_paths})
在训练期间,我不断从火车组获得小批量。当我禁用批量标准化时,一切正常。如果我启用批量规范,则需要输入X的静态形状(小批量)。我重新塑造它,但这次它在执行过程中崩溃,如上所示。
附录(3)
我想我弄明白它崩溃的地方。在计算成本后运行优化器时可能会崩溃。
首先是命令序列: 首先转发道具,然后计算成本,然后运行优化器。前两个似乎工作但不是优化器。
我是如何定义优化器的:
with tf.name_scope("train"):
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
# Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost_mini_batch)
我有update_ops可以更新移动平均线。如果我解释正确,它只是在尝试更新移动平均线时崩溃。我可能也错误地解释了错误消息。
附录(4)
我尝试根据已知尺寸进行标准化并且它有效!但这并不是我想要规范化的维度,这现在令人困惑。让我详细说明一下:
输入层中的单位数:5 第1层中的单位数(第一个隐藏层):6 所以weight1是(6,5)矩阵 假设迷你批量大小为7。 在我的例子中,A [0](或X_mini_batch)的形状是:(7,5),其中7是迷你批次中的#训练样本,5是输入层中的#单位。
计算Z [1]时...... Z [1] = weight1 * A [0] .transpose ...然后Z [1]的形状是(6,7)矩阵,其中每列为每个火车样本提供6个特征。
问题是我们要在Z [1]中标准化哪一列?对我来说有意义的是,你将所有给定的火车样本中的每个特征标准化。这意味着我需要规范化每一行bcz我对每行中的不同列示例都有不同的特征值。并且由于Z [1]具有形状(6,7),如果我设置轴= 0,则应该参考每行中的归一化。在我的情况下,7是未知数字,所以它不会受到伤害。基于这个逻辑,它的工作原理!但是我很困惑,如果axis = 0真的指的是这里的每一行......让我展示一下这个轴问题的另一个例子,这已经困扰了我很长一段时间..
与此主题无关代码示例:
cc = tf.constant([[1.,2.,3.],
[4.,5.,6.]])
with tf.Session() as sess:
print(sess.run(tf.reduce_mean(cc, axis=0)))
print(sess.run(tf.reduce_mean(cc, axis=1)))
这给出了以下输出:
[2.5 3.5 4.5]
[2. 5.]
当我将axis设置为0时,它给出了每列的平均值。如果axis = 1,则给出每行的平均值。
(注意cc.shape给出(2,3))
现在百万美元的问题:在二维矩阵中,当我想要解决每一行时,轴是0还是1?
附录(5) 我想我现在得到了正确的答案。让我在这里总结一下我对轴的理解。希望我现在能得到它......
这是具有形状(6,7)的Z [1]矩阵表示:
t_ex:训练的例子 f:功能
t_ex1 t_ex2 t_ex3 t_ex4 t_ex5 t_ex6 t_ex7
f1 f1 f1 f1 f1 f1 f1
f2 f2 f2 f2 f2 f2 f2
f3 f3 f3 f3 f3 f3 f3
f4 f4 f4 f4 f4 f4 f4
f5 f5 f5 f5 f5 f5 f5
f6 f6 f6 f6 f6 f6 f6
在上面的这个迷你批次中,有7个列车示例,每个列车ex具有6个特征(因为第1层中有6个单元)。当我们说'#34; tf.layers.batch_normalization(..,axis = 0)&#34;时,我们的意思是每个特征必须按行进行归一化,以消除 - 比如说 - f1值之间的高差异在第一行。
换句话说,我们不会将f1,f2,f3,f4,f5,f6相互标准化。我们将f1:s相互标准化,f2:s相互标准化,依此类推......
答案 0 :(得分:3)
Q1)将gamma初始化为1,beta为0表示直接使用归一化输入。由于没有关于层输出的方差应该是什么的先验信息,因此假设标准高斯是公平的。
Q2)在训练阶段(training=True
)期间,假设训练数据是随机采样的,则使用自己的均值和var对批次进行标准化。在测试期间(training=False
),由于测试数据可以任意采样,我们不能使用它们的均值和var。因此,正如您所说,我们使用来自最后“100”训练迭代的移动平均估计。
Q3)是的,可训练是指beta
和gamma
。有些情况需要设置trainable=False
,例如如果使用新方法更新参数,或者如果batch_norm图层是预先训练的并且需要冻结。
问题4)您可能也注意到其他reuse
函数中的tf.layers
参数。一般来说,如果您想多次调用一个图层(例如训练和验证)并且您不想让TensorFlow认为您正在创建一个新图层,那么您需要设置reuse=True
。我更喜欢with tf.variable_scope(..., reuse=tf.AUTO_REUSE):
达到同样的目的。
Q5)我不确定这个。我想这是为那些想要设计新技巧来调整规模和偏见的用户。
Q6)是的,你是对的。