尝试确保我的新计算机设置正确,并注意到其GeForce 1080 GPU在我的旧系统上显着低于980TI(运行Tensorflow作业时)。由于系统不仅仅是GPU,我还写了一个小基准来隔离TensorFlow中的GPU矩阵乘法性能。结果证实新GPU明显变慢。我知道这与安装的软件有关,但是我已经检查过显而易见的事情:同样的python3,同样的cudnn,同样的numpy。什么可能导致这种奇怪的性能差距?
基准测试脚本:
import tensorflow as tf
import time
sess = tf.Session()
A = tf.random_uniform((1000,1000))
for i in range(int(1e3)):
A = (tf.matmul(A,A))
cur_time = time.clock()
sess.run(A)
print(time.clock()-cur_time)
旧系统(980 Ti):
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.19
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.39GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
time elapsed: 0.81484
新系统(1080):
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.898
pciBusID 0000:03:00.0
Total memory: 7.92GiB
Free memory: 7.57GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0)
time elapsed: 1.2753620000000003