tensorflow比vgg16架构上的pytorch慢2.5倍

时间:2017-01-24 16:03:53

标签: tensorflow

所以我到目前为止试图进入张量流并喜欢它。

今天我升级到了cuda 8,cudnn 5.1和tensorflow 0.12.1。使用Maxwell Titan X GPU。

使用以下加载预训练的vgg16的短代码:

import tensorflow as tf
from tensorflow.contrib import slim
from tensorflow.contrib.slim import nets

tf.reset_default_graph()
input_images = tf.placeholder(tf.float32, [None, 224, 224, 3], 'image')
preds = nets.vgg.vgg_16(input_images, is_training=False)[0]
saver = tf.train.Saver()

config = tf.ConfigProto(log_device_placement=True,
                        gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction = 0.5))
sess = tf.InteractiveSession(config=config)
saver.restore(sess, './vgg_16.ckpt')

_in = np.random.randn(16, 224, 224, 3).astype(np.float32)

然后我计时前进:

%timeit sess.run(preds, feed_dict={input_images: _in})

我每批获得160毫秒(仅向前传递),根据this benchmark(也慢于MatconvNet),它似乎比火炬中的相应配置慢2.5倍。

操作似乎正确分配给gpu,并且正确找到了cuda库,我还缺少什么?

编辑:cudnn和cuda已正确找到

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:04:00.0
Total memory: 11.92GiB
Free memory: 11.81GiB

由于input_images替换tf.random_uniform((16, 224, 224, 3), maxval=255)并不会改变时间,因此喂食似乎不是问题。

编辑2:所以我比较了在同一台机器上运行的pytorch版本,我得到了(批量为16x224x224x3):

  • Resnet-50: pytorch 48ms vs tf 58 ms (确定)
  • VGG16: pytorch 65ms vs tf 160ms (不行)

1 个答案:

答案 0 :(得分:0)

最近在cuda 9.0,tensorflow 1.9和pytorch 0.4.1上进行了测试,对于相同的操作,现在的差异可以忽略不计。

请参见the proper timing here