Question

我正在尝试最近的arxiv作品，名为＆＃34; Factorized CNN＆＃34;，

主要认为空间分离卷积（深度卷积）和通道线性投影（1x1conv）可以加速卷积运算。

this is the figure for their conv layer architecture

我发现我可以使用tf.nn.depthwise_conv2d和1x1卷积或使用tf.nn.separable_conv2d来实现此架构。

以下是我的实施：

＆＃13;

#conv filter for depthwise convolution
depthwise_filter = tf.get_variable("depth_conv_w", [3,3,64,1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/32)))
#conv filter for linear channel projection
pointwise_filter = tf.get_variable("point_conv_w", [1,1,64,64], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/1/64)))
conv_b = tf.get_variable("conv_b", [64], initializer=tf.constant_initializer(0))
#depthwise convolution, with multiplier 1
conv_tensor = tf.nn.relu(tf.nn.depthwise_conv2d(tensor, depthwise_filter, [1,1,1,1], padding='SAME'))
#linear channel projection with 1x1 convolution
conv_tensor = tf.nn.bias_add(tf.nn.conv2d(conv_tensor, pointwise_filter, [1,1,1,1], padding='VALID'), conv_b)
#residual
tensor = tf.add(tensor, conv_tensor)

＆＃13;

这应该比原来的3x3x64快9倍 - ＆gt; 64通道卷积。

但是，我无法体验任何性能提升。

我必须假设我做错了，或者说tensorflow的实现有问题。

由于使用depthwise_conv2d的例子很少，我在这里留下这个问题。

速度慢吗？或者有任何错误吗？

Answer 1

当前深度conv2d的实现并没有充分利用GPU的并行功率，你需要等待将来更快的实现，例如，在caffe中，存在更快的第三方impl这个内核{{3 }}

Answer 2

深度卷积可显着提高性能由于参数和多添加项都减少了。但是，使用GPU训练深度卷积层很慢在当前的深度学习框架中，因为它们的实现无法充分利用GPU的容量。

https://arxiv.org/pdf/1803.09926.pdf

tf.nn.depthwise_conv2d太慢了。这是正常的吗？

2 个答案: