所以我到目前为止试图进入张量流并喜欢它。
今天我升级到了cuda 8,cudnn 5.1和tensorflow 0.12.1。使用Maxwell Titan X GPU。
使用以下加载预训练的vgg16的短代码:
import tensorflow as tf
from tensorflow.contrib import slim
from tensorflow.contrib.slim import nets
tf.reset_default_graph()
input_images = tf.placeholder(tf.float32, [None, 224, 224, 3], 'image')
preds = nets.vgg.vgg_16(input_images, is_training=False)[0]
saver = tf.train.Saver()
config = tf.ConfigProto(log_device_placement=True,
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction = 0.5))
sess = tf.InteractiveSession(config=config)
saver.restore(sess, './vgg_16.ckpt')
_in = np.random.randn(16, 224, 224, 3).astype(np.float32)
然后我计时前进:
%timeit sess.run(preds, feed_dict={input_images: _in})
我每批获得160毫秒(仅向前传递),根据this benchmark(也慢于MatconvNet),它似乎比火炬中的相应配置慢2.5倍。
操作似乎正确分配给gpu,并且正确找到了cuda库,我还缺少什么?
编辑:cudnn和cuda已正确找到
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:04:00.0
Total memory: 11.92GiB
Free memory: 11.81GiB
由于input_images
替换tf.random_uniform((16, 224, 224, 3), maxval=255)
并不会改变时间,因此喂食似乎不是问题。
编辑2:所以我比较了在同一台机器上运行的pytorch版本,我得到了(批量为16x224x224x3):
答案 0 :(得分:0)
最近在cuda 9.0,tensorflow 1.9和pytorch 0.4.1上进行了测试,对于相同的操作,现在的差异可以忽略不计。