我已经测试了在TF中实现的separable_conv2d
和normal conv2d
的速度,似乎唯一的depthwise_conv2d
比normal conv2d
快,但是dw_conv2d的性能很差明显。
MobileNet中提到的separable_conv2d,其FLOPs
是普通kernel_size=3
时的1/9,但是考虑到Memory Access Cost
,可分隔的那张不能比普通的快9倍。一种,但在我的实验中,可分离的一种慢得多。
我像这样separable_conv2d is too slow对实验进行建模。在此实验中,当depth_multiply = 1时,separable_conv2d似乎比正常的快,但是当我使用tf.nn
来实现它时,如下所示:
IMAGE_SIZE= 512
REPEAT = 100
KERNEL_SIZE = 3
data_format = 'NCHW'
#CHANNELS_BATCH_SIZE = 2048 # channe# ls * batch_size
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
def normal_layers(inputs, nfilter, name=''):
with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
shape = inputs.shape.as_list()
in_channels = shape[1]
filter = tf.get_variable(initializer=tf.initializers.random_normal,
shape=[KERNEL_SIZE, KERNEL_SIZE,
in_channels, nfilter], name='weight')
conv = tf.nn.conv2d(input= inputs, filter=filter, strides=
[1,1,1,1],padding='SAME',data_format=data_format,
name='conv')
return conv
def sep_layers(inputs, nfilter, name=''):
with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
shape= inputs.shape.as_list()
in_channels = shape[1]
dw_filter=
tf.get_variable(initializer=tf.initializers.random_normal,
shape=[KERNEL_SIZE, KERNEL_SIZE,
in_channels, 1], name='dw_weight')
pw_filter =
tf.get_variable(initializer=tf.initializers.random_normal,
shape=[1,1,in_channels, nfilter],
name='pw_weight')
conv = tf.nn.depthwise_conv2d_native(input=inputs,
filter=dw_filter,
strides=[1,1,1,1],
padding='SAME',
data_format=data_format)
conv = tf.nn.conv2d(input=conv,
filter=pw_filter,
strides=[1,1,1,1],
padding='SAME',
data_format=data_format)
return conv
每个图层都在100 times
中运行,
与链接不同的是,我将batch_size
设置为常数10,
和channels is in [32, 64, 128]
,
输入为[batch_size,频道,img_size,img_size]
以及其中的duration
如下:
Channels: 32
Normal Conv 0.7769527435302734s, Sep Conv 1.4197885990142822s
Channels: 64
Normal Conv 0.8963277339935303s, Sep Conv 1.5703468322753906s
Channels: 128
Normal Conv 0.9741833209991455s, Sep Conv 1.665834665298462s
当batch_size为常数时,仅更改通道似乎正常的时间和可分离的时间成本在逐渐增加。
并且在将batch_size * channels
设置为常量时
输入形状为[CHANNELS_BATCH_SIZE //通道,通道,imgsize,imgsize]
Channels: 32
Normal Conv 0.871959924697876s, Sep Conv 1.569300651550293s
Channels: 64
Normal Conv 0.909860372543335s, Sep Conv 1.604109525680542s
Channels: 128
Normal Conv 0.9196009635925293s, Sep Conv 1.6144189834594727s
让我感到困惑的是,结果与上面链接的结果不同:sep_conv2d的时间成本没有明显变化。
我的问题是:
任何帮助将不胜感激。预先感谢。