Question

我正在重新实现MobileNet，但是我发现深度卷积并不比conv2d快（我还没有包括1 by 1点向卷积）。这是在colab上运行的测试代码：https://colab.research.google.com/drive/1nBuYrmmH5kM0jbtIZdsuiG6uJbU6mpA7?usp=sharing

import tensorflow as tf
import time
x = tf.random.normal((2, 64, 64, 3))
conv = tf.keras.layers.Conv2D(16, 3, strides=1, padding='same')
dw = tf.keras.layers.DepthwiseConv2D(3, padding='same')
start = time.time()
conv(x)
print('conv2d:', time.time() - start)    # approximate 0.0036s
start = time.time()
dw(x)
print('dw:', time.time() - start)    # approximate 0.0034s
%timeit conv(x)    # 1000 loops, best of 3: 225 µs per loop
%timeit dw(x)    # 1000 loops, best of 3: 352 µs per loop

我也仅在使用CPU的笔记本电脑上尝试过，发现了类似的结果。为什么DepthwiseConv2D比Conv2D慢？我有什么错误吗？

Answer 1

尽管内存效率更高，但深度2D卷积确实比常规2D卷积要慢。

Gholami et al.（SqueezeNext：硬件感知神经网络设计）指出：

这样做的原因是在硬件性能方面，深度可分离的卷积效率低下，这是由于其算术强度（计算与内存操作的比率）较差所致。

为什么`DepthwiseConv2D`要比`Conv2D`慢

1 个答案: