Question

tf.nn.max_pool的{{1}}中的'SAME'和'VALID'填充有什么区别？

在我看来，'VALID'意味着当我们做最大池时，边缘外没有零填充。

根据A guide to convolution arithmetic for deep learning，它表示池操作符中没有填充，即只使用tensorflow的'VALID'。但tensorflow中最大池的'SAME'填充是什么？

Answer 1

如果你喜欢ascii art：

"VALID" =没有填充：

   inputs:         1  2  3  4  5  6  7  8  9  10 11 (12 13)
                  |________________|                dropped
                                 |_________________|

"SAME" =零填充：

               pad|                                      |pad
   inputs:      0 |1  2  3  4  5  6  7  8  9  10 11 12 13|0  0
               |________________|
                              |_________________|
                                             |________________|

在这个例子中：

输入宽度= 13
过滤器宽度= 6
Stride = 5

注意：

"VALID"只会删除最右侧的列（或最底部的行）。
"SAME"尝试向左和向右均匀填充，但如果要添加的列数为奇数，则会向右添加额外列，如本示例中的情况（相同的逻辑适用）垂直：底部可能有一行额外的零。）

Answer 2

我举一个例子来说明一点：

x：输入形状[2,3]，1通道
valid_pad：带有2x2内核，步幅2和VALID填充的最大池。
same_pad：最大池，2x2内核，步幅2和SAME填充（这是经典方式）

输出形状为：

valid_pad：此处没有填充，因此输出形状为[1,1]
same_pad：在这里，我们将图像填充到形状[2,4]（使用-inf然后应用最大池），因此输出形状为[1,2]

x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

x = tf.reshape(x, [1, 2, 3, 1])  # give a shape accepted by tf.nn.max_pool

valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

valid_pad.get_shape() == [1, 1, 1, 1]  # valid_pad is [5.]
same_pad.get_shape() == [1, 1, 2, 1]   # same_pad is  [5., 6.]

Answer 3

当e0e0e0e0e0e0e0e0e0e0e0e0e0e为1时（卷积比汇集更典型），我们可以考虑以下区别：

stride：输出大小与输入大小相同。这需要过滤器窗口在输入映射外滑动，因此需要填充。
"SAME"：过滤器窗口位于输入地图内的有效位置，因此输出尺寸缩小"VALID"。没有填充。

Answer 4

TensorFlow Convolution示例概述了SAME和VALID之间的区别：

对于SAME填充，输出高度和宽度计算如下：

out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))

和

对于VALID填充，输出高度和宽度计算如下：

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

Answer 5

填充是一种增加输入数据大小的操作。在一维数据的情况下，您只需在数组中附加/前置一个常数，在二维模拟环绕矩阵中使用这些常数。在n-dim中，您可以使用常量围绕n-dim超立方体。在大多数情况下，此常量为零，称为零填充。

以下是将p=1应用于2-d张量的零填充示例：

您可以为内核使用任意填充，但某些填充值的使用频率高于其他填充值：

VALID填充。最简单的情况，意味着根本没有填充。只需保留您的数据即可。
SAME填充有时称为 HALF填充。它被称为 SAME ，因为对于stride = 1的卷积（或用于汇集），它应该产生与输入相同大小的输出。它被称为 HALF ，因为对于大小为k
FULL padding 是最大填充，不会导致仅对填充元素进行卷积。对于大小为k的内核，此填充等于k - 1。

要在TF中使用任意填充，您可以使用tf.pad()

Answer 6

与YvesgereY的出色回答相辅相成，我发现这种可视化非常有帮助：

填充“ 有效”是第一个数字。过滤器窗口停留在图像内部。

填充“ 相同”是第三个数字。输出大小相同。

在此article上找到了它。

Answer 7

快速解释

VALID：不要应用任何填充，即假设所有维度有效，以便输入图像完全被过滤器覆盖，并且您指定了步幅。

SAME：将填充应用于输入（如果需要），以便输入图像被过滤器完全覆盖并按指定步幅。对于步幅1，这将确保输出图像大小与输入相同。

备注

这同样适用于conv层以及max pool层

术语＆＃34;有效＆＃34;有点用词不当，因为事情不会成为＆＃34;无效＆＃34;如果你丢弃部分图像。有时你甚至可能想要那样。这可能应该被称为NO_PADDING。

术语＆＃34;相同＆＃34;也是一个用词不当，因为当输出维度与输入维度相同时，它只对1的步幅有意义。例如，对于2的步幅，输出尺寸将是一半。这可能应该被称为AUTO_PADDING。

在SAME（即自动填充模式）中，Tensorflow会尝试在左右两侧均匀分布填充。

在VALID中（即无填充模式），如果您的过滤器和步幅没有完全覆盖输入图像，Tensorflow将会向右和/或底部单元格下降。

Answer 8

填充有三种选择：有效（无填充），相同（或一半），填充。你可以在这里找到解释（在Theano）： http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

有效或无填充：

有效填充不涉及零填充，因此它仅覆盖有效输入，不包括人工生成的零。如果步幅s = 1，则输出长度为（（输入长度） - （k-1））内核大小为k。

相同或半填充：

当s = 1时，相同的填充使输出的大小与输入的大小相同。如果s = 1，则填充的零数为（k-1）。

全填充：

完整填充意味着内核在整个输入上运行，因此在最后，内核可能只满足一个输入和零。如果s = 1，填充的零的数量是2（k-1）。如果s = 1，则输出长度为（（输入长度）+（k-1）。

因此，填充数量:(有效）＆lt; =（相同）＆lt; =（完整）

Answer 9

我从官方张量流文档引用此答案https://www.tensorflow.org/api_guides/python/nn#Convolution 对于'SAME'填充，输出高度和宽度计算如下：

out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))

并且顶部和左侧的填充计算为：

pad_along_height = max((out_height - 1) * strides[1] +
                    filter_height - in_height, 0)
pad_along_width = max((out_width - 1) * strides[2] +
                   filter_width - in_width, 0)
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left

对于'VALID'填充，输出高度和宽度计算如下：

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

，填充值始终为零。

Answer 10

根据here的解释并跟进Tristan的回答，我通常会使用这些快速功能进行健全性检查。

# a function to help us stay clean
def getPaddings(pad_along_height,pad_along_width):
    # if even.. easy..
    if pad_along_height%2 == 0:
        pad_top = pad_along_height / 2
        pad_bottom = pad_top
    # if odd
    else:
        pad_top = np.floor( pad_along_height / 2 )
        pad_bottom = np.floor( pad_along_height / 2 ) +1
    # check if width padding is odd or even
    # if even.. easy..
    if pad_along_width%2 == 0:
        pad_left = pad_along_width / 2
        pad_right= pad_left
    # if odd
    else:
        pad_left = np.floor( pad_along_width / 2 )
        pad_right = np.floor( pad_along_width / 2 ) +1
        #
    return pad_top,pad_bottom,pad_left,pad_right

# strides [image index, y, x, depth]
# padding 'SAME' or 'VALID'
# bottom and right sides always get the one additional padded pixel (if padding is odd)
def getOutputDim (inputWidth,inputHeight,filterWidth,filterHeight,strides,padding):
    if padding == 'SAME':
        out_height = np.ceil(float(inputHeight) / float(strides[1]))
        out_width  = np.ceil(float(inputWidth) / float(strides[2]))
        #
        pad_along_height = ((out_height - 1) * strides[1] + filterHeight - inputHeight)
        pad_along_width = ((out_width - 1) * strides[2] + filterWidth - inputWidth)
        #
        # now get padding
        pad_top,pad_bottom,pad_left,pad_right = getPaddings(pad_along_height,pad_along_width)
        #
        print 'output height', out_height
        print 'output width' , out_width
        print 'total pad along height' , pad_along_height
        print 'total pad along width' , pad_along_width
        print 'pad at top' , pad_top
        print 'pad at bottom' ,pad_bottom
        print 'pad at left' , pad_left
        print 'pad at right' ,pad_right

    elif padding == 'VALID':
        out_height = np.ceil(float(inputHeight - filterHeight + 1) / float(strides[1]))
        out_width  = np.ceil(float(inputWidth - filterWidth + 1) / float(strides[2]))
        #
        print 'output height', out_height
        print 'output width' , out_width
        print 'no padding'


# use like so
getOutputDim (80,80,4,4,[1,1,1,1],'SAME')

Answer 11

打开/关闭填充。确定输入的有效大小。

get无填充。卷积运算等操作仅在“有效”位置执行，即不太靠近张量边界。
使用3x3的内核和10x10的图像，您将在边界内的8x8区域上进行卷积。

put提供了填充。每当您的操作引用邻域（无论大小）时，当该邻域扩展到原始张量之外时，都会提供零值，以使该操作也可以处理边界值。
使用3x3内核和10x10图像，您将在整个10x10区域上进行卷积。

Answer 12

VALID 填充：这是零填充。希望没有混乱。

x = tf.constant([[1., 2., 3.], [4., 5., 6.],[ 7., 8., 9.], [ 7., 8., 9.]])
x = tf.reshape(x, [1, 4, 3, 1])
valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
print (valid_pad.get_shape()) # output-->(1, 2, 1, 1)

SAME 填充：首先要理解这一点很棘手，因为我们必须分别考虑official docs中提到的两个条件。

我们将输入视为 $n_i$ ，输出为 $n_o$ ，填充为 $p_i$ ，步幅为 $s$ ，内核大小为 $k$ （仅一维）被视为）

案例01： $n_i \mod s = 0$ ： $p_i = max(k-s ,0)$

案例02： $n_i \mod s \neq 0$ ： $p_i = max(k - (n_i\mod s)), 0)$

$p_i$ 计算su可以用于填充的最小值。由于 $p_i$ 的值已知，因此可以使用此公式 $n_0$ 找到 $(n_i - k + 2p_i)/2 + 1 = n_0$ 的值。

让我们来看看这个例子：

x = tf.constant([[1., 2., 3.], [4., 5., 6.],[ 7., 8., 9.], [ 7., 8., 9.]])
x = tf.reshape(x, [1, 4, 3, 1])
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
print (same_pad.get_shape()) # --> output (1, 2, 2, 1)

这里x的维数是（3,4）。然后，如果采取水平方向（3）：

$n_i = 3, k =2, s =2, p_i = 2 - (3\mod 2) = 1, n_0 = floor (\frac{3-2+2*1}{2} + 1) = 2$

如果采用垂直方向（4）：

$n_i = 4, k =2, s =2, p_i = 2 - 2 = 0, n_0 = floor (\frac{3-2+2*0}{2} + 1) = 2$

希望这有助于了解实际上 SAME 填充在TF中是如何工作的。

Answer 13

总而言之，“有效”填充表示没有填充。卷积层的输出大小根据输入大小和内核大小而缩小。

相反，“相同”填充表示使用填充。当跨度设置为1时，在计算卷积时，通过在输入数据周围附加一定数量的“ 0边界”，将卷积层的输出大小保持为输入大小。

希望这种直观的描述会有所帮助。

Answer 14

这里，W和H是输入的宽度和高度， F是过滤器尺寸， P是填充尺寸（即要填充的行数或列数）

对于相同的填充：

有效填充：

Answer 15

Tensorflow 2.0兼容答案：上面提供了有关“有效”填充和“相同”填充的详细说明。

但是，为了社区的利益，我将在 Tensorflow 2.x (>= 2.0) 中指定不同的Pooling Function及其相应的命令。

1.x中的功能：

tf.nn.max_pool

tf.keras.layers.MaxPool2D

Average Pooling => None in tf.nn, tf.keras.layers.AveragePooling2D

2.x中的功能：

tf.nn.max_pool （如果用于2.x和 tf.compat.v1.nn.max_pool_v2 或 tf.compat.v2.nn.max_pool ）（如果已迁移）从1.x到2.x。

tf.keras.layers.MaxPool2D （如果在2.x和

中使用）

tf.compat.v1.keras.layers.MaxPool2D 或 tf.compat.v1.keras.layers.MaxPooling2D 或 tf.compat.v2.keras.layers.MaxPool2D 或 tf.compat.v2.keras.layers.MaxPooling2D （如果从1.x迁移到2.x）。

Average Pooling => tf.nn.avg_pool2d 或 tf.keras.layers.AveragePooling2D （如果在TF 2.x和

中使用）

tf.compat.v1.nn.avg_pool_v2 或 tf.compat.v2.nn.avg_pool 或 tf.compat.v1.keras.layers.AveragePooling2D 或 tf.compat.v1.keras.layers.AvgPool2D 或 tf.compat.v2.keras.layers.AveragePooling2D 或 tf.compat.v2.keras.layers.AvgPool2D （如果从1.x迁移到2.x）。

有关从Tensorflow 1.x到2.x迁移的更多信息，请参阅此Migration Guide。

tensorflow的tf.nn.max_pool中'SAME'和'VALID'填充有什么区别？

15 个答案: