Question

我正在使用单个gpu在google云实例上运行tensorflow。我认为它使用gpu，基于以下消息：

2017-04-27 06:24:23.173402: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.173558: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.173607: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.173646: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.173700: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 06:24:23.341713: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-04-27 06:24:23.342735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:04.0
Total memory: 11.17GiB
Free memory: 11.09GiB
2017-04-27 06:24:23.342994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-04-27 06:24:23.343049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-04-27 06:24:23.343103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0)
2017-04-27 06:24:24.069732: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 18226 get requests, put_count=10104 evicted_count=1000 eviction_rate=0.0989707 and unsatisfied allocation rate=0.50598
2017-04-27 06:24:24.069915: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
2017-04-27 06:24:24.566246: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 18363 get requests, put_count=10456 evicted_count=1000 eviction_rate=0.0956389 and unsatisfied allocation rate=0.486304
2017-04-27 06:24:24.566429: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
batch 0: global_norm = 15739501.000
2017-04-27 06:24:25.017334: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1023 get requests, put_count=2058 evicted_count=1000 eviction_rate=0.485909 and unsatisfied allocation rate=0.00879765
2017-04-27 06:24:25.017506: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 493 to 542
2017-04-27 06:24:25.480102: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3475 get requests, put_count=6528 evicted_count=3000 eviction_rate=0.459559 and unsatisfied allocation rate=0.00028777
batch 1: global_norm = 14174161.000
2017-04-27 06:24:25.945373: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4492 get requests, put_count=8551 evicted_count=4000 eviction_rate=0.467782 and unsatisfied allocation rate=0
2017-04-27 06:24:26.412995: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6726 get requests, put_count=12791 evicted_count=6000 eviction_rate=0.46908 and unsatisfied allocation rate=0
batch 2: global_norm = 33107152.000
2017-04-27 06:24:26.882972: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 7952 get requests, put_count=15024 evicted_count=7000 eviction_rate=0.465921 and unsatisfied allocation rate=0
batch 3: global_norm = 15763463.000
2017-04-27 06:24:27.600348: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1256 get requests, put_count=2343 evicted_count=1000 eviction_rate=0.426803 and unsatisfied allocation rate=0
2017-04-27 06:24:28.072395: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3494 get requests, put_count=6589 evicted_count=3000 eviction_rate=0.455304 and unsatisfied allocation rate=0
batch 4: global_norm = 21566338.000
2017-04-27 06:24:28.549896: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 5730 get requests, put_count=10835 evicted_count=5000 eviction_rate=0.461467 and unsatisfied allocation rate=0
2017-04-27 06:24:29.028344: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 7757 get requests, put_count=14872 evicted_count=7000 eviction_rate=0.470683 and unsatisfied allocation rate=0
batch 5: global_norm = 21483036.000
2017-04-27 06:24:29.768236: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1277 get requests, put_count=2417 evicted_count=1000 eviction_rate=0.413736 and unsatisfied allocation rate=0
batch 6: global_norm = 11463346.000
2017-04-27 06:24:30.257765: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4525 get requests, put_count=8679 evicted_count=4000 eviction_rate=0.460883 and unsatisfied allocation rate=0
2017-04-27 06:24:30.752543: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6977 get requests, put_count=13146 evicted_count=6000 eviction_rate=0.456413 and unsatisfied allocation rate=0
batch 7: global_norm = 11743794.000
2017-04-27 06:24:31.522024: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2314 get requests, put_count=4518 evicted_count=2000 eviction_rate=0.442674 and unsatisfied allocation rate=0
batch 8: global_norm = 7594899.500
2017-04-27 06:24:32.030184: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 5775 get requests, put_count=11000 evicted_count=5000 eviction_rate=0.454545 and unsatisfied allocation rate=0
batch 9: global_norm = 12924121.000
2017-04-27 06:24:32.832804: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2349 get requests, put_count=4621 evicted_count=2000 eviction_rate=0.432807 and unsatisfied allocation rate=0
batch 10: global_norm = 7920631.000
2017-04-27 06:24:33.656608: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 27653 get requests, put_count=28155 evicted_count=6000 eviction_rate=0.213106 and unsatisfied allocation rate=0.209634
2017-04-27 06:24:33.656770: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 3296 to 3625
batch 11: global_norm = 7384579.000
2017-04-27 06:24:34.519065: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 27351 get requests, put_count=27607 evicted_count=5000 eviction_rate=0.181113 and unsatisfied allocation rate=0.186684
2017-04-27 06:24:34.519240: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 3987 to 4385
batch 12: global_norm = 9704661.000
2017-04-27 06:24:35.432504: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1618 get requests, put_count=3100 evicted_count=1000 eviction_rate=0.322581 and unsatisfied allocation rate=0
batch 13: global_norm = 10564804.000
2017-04-27 06:24:36.776085: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1675 get requests, put_count=3316 evicted_count=1000 eviction_rate=0.301568 and unsatisfied allocation rate=0

忽略'batch n：global_norm = ...'，它们来自我自己的代码。在顶部附近，我看到了

2017-04-27 06:24:23.343103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0)

根据How to tell if tensorflow is using gpu acceleration from inside python shell?，这行消息表明我的张量流正在使用gpu。

但是，另一方面，我没有看到通常的消息说cuda库已成功打开。例如，我希望从上面的链接中看到这种消息，

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

但是，我在谷歌云实例上运行代码时看不到它。这是否意味着我的张量流正在使用gpu，但没有充分利用cuda库（无论这意味着什么）？顺便说一下，谷歌云上的计算速度明显快于不使用gpu进行张量流的本地计算机：当我运行一个由500个单位组成的循环神经网络时，单个纪元在谷歌云上占了91s而在1077s上我的本地电脑。这似乎是谷歌云使用gpu的证据，但我想知道如果它不使用CUDA库它是否会更快。

我的张量流似乎是使用gpu，但没有显示通常的消息“成功打开CUDA库”

0 个答案: