Question

我不太清楚tf.nn.separable_conv2d到底做了什么。似乎pointwise_filter是生成下一层的一个像素时不同特征的缩放因子。但我不确定我的解释是否正确。这种方法有什么参考，有什么好处？

tf.nn.separable_conv2d生成与tf.nn.conv2d相同的形状。我想我可以用tf.nn.separable_conv2d替换tf.nn.conv2d。但是使用tf.nn.separable_conv2d时的结果似乎非常糟糕。网络很早就停止了学习。对于MNIST数据集，准确度只是随机猜测~10％。

我想当我将pointwise_filter值设置为全1.0并使其无法训练时，我会得到与tf.nn.conv2d相同的东西。但不是真的......仍然有~10％的准确度。

但是当tf.nn.conv2d与相同的超参数一起使用时，准确度可以达到99％。为什么呢？

此外，它还需要channel_multiplier * in_channels＆lt; out_channels。为什么？ channel_multiplier在这里扮演什么角色？

感谢。

编辑：

我之前使用的channel_multiplier为1.0。也许这是一个糟糕的选择。将其更改为2.0后，精度会变得更好。但是channel_multiplier的作用是什么？为什么1.0不是一个好的价值？

Answer 1

tf.nn.separable_conv2d()实现了所谓的可分离卷积＆＃39;在slide 26 and onwards of this talk上描述。

我们的想法是，不是在图像的所有通道上联合卷积，而是在每个通道上运行单独的2D卷积，深度为channel_multiplier。 in_channels * channel_multiplier个中间通道连接在一起，并使用1x1卷积映射到out_channels。

它通常是降低预定卷中早期卷积的参数复杂度的有效方法，并且可以大大加快训练速度。 channel_multiplier控制这种复杂性，RGB输入通常为4到8。对于灰度输入，使用它没有多大意义。

Answer 2

在多个输入通道上执行的常规2D卷积中，滤波器与输入一样深，并允许我们自由混合通道以在输出中生成每个元素。深度卷积不能做到这一点-每个通道都保持独立-因此有深度的名称。这是一个有助于解释其工作原理的图表[1]：

如果您查看官方文档，将会发现：

output[b, i, j, k] = sum_{di, dj, q, r} input[b, strides[1] * i + di, strides[2] * j + dj, q] * depthwise_filter[di, dj, q, r] * pointwise_filter[0, 0, q * channel_multiplier + r, k]

和tensorflow中的示例代码进行测试：

import tensorflow as tf
import numpy as np

width = 8
height = 8
batch_size = 100
filter_height = 3
filter_width = 3
in_channels = 3
channel_multiplier = 1
out_channels = 3


input_tensor = tf.get_variable(shape=(batch_size, height, width, in_channels), name="input")
depthwise_filter = tf.get_variable(shape=(filter_height, filter_width, in_channels, channel_multiplier), name="deptwise_filter")
pointwise_filter = tf.get_variable(shape=[1, 1, channel_multiplier * in_channels, out_channels], name="pointwise_filter")

output = tf.nn.separable_conv2d(
    input_tensor,
    depthwise_filter,
    pointwise_filter,
    strides=[1,1,1,1],
    padding='SAME',
)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    output_value = sess.run(output, feed_dict={input_tensor: np.random.rand(batch_size, width, height, in_channels),
                                               depthwise_filter: np.random.rand(filter_height, filter_width, in_channels, channel_multiplier),
                                               pointwise_filter: np.random.rand(1, 1, channel_multiplier * in_channels, out_channels)})
    print(np.shape(output_value))

信用：

[1] https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/

[2] https://www.tensorflow.org/api_docs/python/tf/nn/separable_conv2d

Answer 3

回答问题的最后部分：

此外，它还需要channel_multiplier * in_channels＆lt; out_channels。为什么？

我不知道为什么这个约束最初被放入，但是它已经在TF的当前主分支中被删除，并且应该使其达到1.3版本。这个想法可能就像是“如果你在减少逐点步骤中减少通道的数量，那么你可能已经选择了一个较小的通道乘数并节省了计算”。我猜这种推理是有缺陷的，因为逐点步骤可以组合来自不同depthwise_filters的值，或者可能是因为人们可能希望稍微减小维度，而不是通过完整因子。

Tensorflow：tf.nn.separable_conv2d做什么？

3 个答案: