Question

我实施了一个依赖于3D卷积的模型（对于类似于动作识别的任务），我想使用批量规范化（参见[Ioffe & Szegedy 2015]）。我找不到任何专注于3D转换的教程，因此我在这里做了一个简短的教程，我想和你一起回顾。

下面的代码引用TensorFlow r0.12并且它显式实例变量 - 我的意思是我没有使用tf.contrib.learn，除了tf.contrib.layers.batch_norm（）函数。我这样做既可以更好地了解事情的工作原理，又可以获得更多的实现自由（例如，可变摘要）。

我将通过首先编写完全连接层的示例，然后进行2D卷积，最后编写3D情况，顺利地进入3D卷积情况。在浏览代码时，如果你能检查一切是否正确完成 - 代码运行会很好，但我不能100％确定应用批量规范化的方式。我以更详细的问题结束这篇文章。

import tensorflow as tf

# This flag is used to allow/prevent batch normalization params updates
# depending on whether the model is being trained or used for prediction.
training = tf.placeholder_with_default(True, shape=())

完全连接（FC）的情况

# Input.
INPUT_SIZE = 512
u = tf.placeholder(tf.float32, shape=(None, INPUT_SIZE))

# FC params: weights only, no bias as per [Ioffe & Szegedy 2015].
FC_OUTPUT_LAYER_SIZE = 1024
w = tf.Variable(tf.truncated_normal(
    [INPUT_SIZE, FC_OUTPUT_LAYER_SIZE], dtype=tf.float32, stddev=1e-1))

# Layer output with no activation function (yet).
fc = tf.matmul(u, w)

# Batch normalization.
fc_bn = tf.contrib.layers.batch_norm(
    fc,
    center=True,
    scale=True,
    is_training=training,
    scope='fc-batch_norm')

# Activation function.
fc_bn_relu = tf.nn.relu(fc_bn)
print(fc_bn_relu)  # Tensor("Relu:0", shape=(?, 1024), dtype=float32)

2D卷积（CNN）层情况

# Input: 640x480 RGB images (whitened input, hence tf.float32).
INPUT_HEIGHT = 480
INPUT_WIDTH = 640
INPUT_CHANNELS = 3
u = tf.placeholder(tf.float32, shape=(None, INPUT_HEIGHT, INPUT_WIDTH, INPUT_CHANNELS))

# CNN params: wights only, no bias as per [Ioffe & Szegedy 2015].
CNN_FILTER_HEIGHT = 3  # Space dimension.
CNN_FILTER_WIDTH = 3  # Space dimension.
CNN_FILTERS = 128
w = tf.Variable(tf.truncated_normal(
    [CNN_FILTER_HEIGHT, CNN_FILTER_WIDTH, INPUT_CHANNELS, CNN_FILTERS],
    dtype=tf.float32, stddev=1e-1))

# Layer output with no activation function (yet).
CNN_LAYER_STRIDE_VERTICAL = 1
CNN_LAYER_STRIDE_HORIZONTAL = 1
CNN_LAYER_PADDING = 'SAME'
cnn = tf.nn.conv2d(
    input=u, filter=w,
    strides=[1, CNN_LAYER_STRIDE_VERTICAL, CNN_LAYER_STRIDE_HORIZONTAL, 1],
    padding=CNN_LAYER_PADDING)

# Batch normalization.
cnn_bn = tf.contrib.layers.batch_norm(
    cnn,
    data_format='NHWC',  # Matching the "cnn" tensor which has shape (?, 480, 640, 128).
    center=True,
    scale=True,
    is_training=training,
    scope='cnn-batch_norm')

# Activation function.
cnn_bn_relu = tf.nn.relu(cnn_bn)
print(cnn_bn_relu)  # Tensor("Relu_1:0", shape=(?, 480, 640, 128), dtype=float32)

3D卷积（CNN3D）层案例

# Input: sequence of 9 160x120 RGB images (whitened input, hence tf.float32).
INPUT_SEQ_LENGTH = 9
INPUT_HEIGHT = 120
INPUT_WIDTH = 160
INPUT_CHANNELS = 3
u = tf.placeholder(tf.float32, shape=(None, INPUT_SEQ_LENGTH, INPUT_HEIGHT, INPUT_WIDTH, INPUT_CHANNELS))

# CNN params: wights only, no bias as per [Ioffe & Szegedy 2015].
CNN3D_FILTER_LENGHT = 3  # Time dimension.
CNN3D_FILTER_HEIGHT = 3  # Space dimension.
CNN3D_FILTER_WIDTH = 3  # Space dimension.
CNN3D_FILTERS = 96
w = tf.Variable(tf.truncated_normal(
    [CNN3D_FILTER_LENGHT, CNN3D_FILTER_HEIGHT, CNN3D_FILTER_WIDTH, INPUT_CHANNELS, CNN3D_FILTERS],
    dtype=tf.float32, stddev=1e-1))

# Layer output with no activation function (yet).
CNN3D_LAYER_STRIDE_TEMPORAL = 1
CNN3D_LAYER_STRIDE_VERTICAL = 1
CNN3D_LAYER_STRIDE_HORIZONTAL = 1
CNN3D_LAYER_PADDING = 'SAME'
cnn3d = tf.nn.conv3d(
    input=u, filter=w,
    strides=[1, CNN3D_LAYER_STRIDE_TEMPORAL, CNN3D_LAYER_STRIDE_VERTICAL, CNN3D_LAYER_STRIDE_HORIZONTAL, 1],
    padding=CNN3D_LAYER_PADDING)

# Batch normalization.
cnn3d_bn = tf.contrib.layers.batch_norm(
    cnn3d,
    data_format='NHWC',  # Matching the "cnn" tensor which has shape (?, 9, 120, 160, 96).
    center=True,
    scale=True,
    is_training=training,
    scope='cnn3d-batch_norm')

# Activation function.
cnn3d_bn_relu = tf.nn.relu(cnn3d_bn)
print(cnn3d_bn_relu)  # Tensor("Relu_2:0", shape=(?, 9, 120, 160, 96), dtype=float32)

我想确定的是上面的代码是否完全实现了批量规范化，如第{Sec。}末尾的[Ioffe & Szegedy 2015]中所述。 3.2：

对于卷积层，我们还希望归一化遵循卷积属性 - 以便在不同位置对同一要素图的不同元素以相同方式进行归一化。为实现这一目标，我们联合规范了所有地点的小批量活动。 [...] Alg。类似地修改图2，以便在推理期间，BN变换对给定特征映射中的每次激活应用相同的线性变换。

更新我猜上面的代码对于3D转换情况也是正确的。事实上，当我定义我的模型时，如果我打印所有可训练的变量，我也会看到β和γ变量的预期数量。例如：

Tensor("conv3a/conv3d_weights/read:0", shape=(3, 3, 3, 128, 256), dtype=float32)
Tensor("BatchNorm_2/beta/read:0", shape=(256,), dtype=float32)
Tensor("BatchNorm_2/gamma/read:0", shape=(256,), dtype=float32)

这对我来说没问题，因为由于BN，每个特征图都会学习一对beta和gamma（总共256个）。

[Ioffe＆amp; Szegedy 2015]：批量标准化：通过减少内部协变量转换来加速深度网络训练

Answer 1

这是关于3D batchnorm的一篇很棒的文章，它通常不被注意到batchnorm可以应用于任何大于1的等级。你的代码是正确的，但我无法帮助但是添加一些关于此的重要说明：

A＆＃34;标准＆＃34; 2D batchnorm（接受4D张量）在张量流中可以比3D或更高更快，因为它支持fused_batch_norm实现，适用one kernel operation：

融合批量规范结合了批量生成所需的多个操作规范化为单个内核。批量规范是一个昂贵的过程某些型号的操作占很大比例时间。使用融合批量规范可以带来12％-30％的加速。

还有an issue on GitHub支持3D过滤器，但最近没有任何活动，此时此问题已无法解决。
虽然原始论文规定在ReLU激活之前使用batchnorm（以及您在上面的代码中所做的事情），但有证据表明使用batchnorm 之后可能更好激活。以下是Francois Chollet对Keras GitHub的评论：

......我可以保证最近由Christian [Szegedy]编写的代码应用relu 在国阵之前。不过，它仍然偶尔会成为辩论的主题。
对于有兴趣在实践中应用规范化理念的人来说，最近有关于这一理念的研究发展，即weight normalization和layer normalization，它们解决了原始的某些缺点。例如，batchnorm可以更好地用于LSTM和循环网络。

使用TensorFlow中的3D卷积进行批量标准化

完全连接（FC）的情况

2D卷积（CNN）层情况

3D卷积（CNN3D）层案例

1 个答案: