Tensorflow卡在tensorflow / core / common_runtime / gpu / pool_allocator.cc:259]将pool_size_limit_从256提升到281

时间:2017-11-10 15:05:28

标签: tensorflow gpu tensorflow-gpu

我正在运行Tensorflow代码,此时此代码总是卡住:

tensorflow / core / common_runtime / gpu / pool_allocator.cc:259]将pool_size_limit_从256提升到281

我尝试了不同的内存配置但没有工作。

代码永远不会失败,但永远不会从那里进步,所以我最终取消了这项工作。我在Tesla K40m上运行它,每个CPU有4个CPU,内存为16G。

以下是完整输出:

2017-11-10 17:00:15.091618: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-10 17:00:15.091997: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-10 17:00:15.092010: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-10 17:00:17.609926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:0d:00.0
Total memory: 11.17GiB
Free memory: 11.09GiB
2017-11-10 17:00:17.609969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-11-10 17:00:17.609979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-11-10 17:00:17.609994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:0d:00.0)
2017-11-10 17:00:39.678955: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1787 get requests, put_count=1575 evicted_count=1000 eviction_rate=0.634921 and unsatisfied allocation rate=0.734191
2017-11-10 17:00:39.679428: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
2017-11-10 17:01:25.550744: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2416 get requests, put_count=2444 evicted_count=1000 eviction_rate=0.409165 and unsatisfied allocation rate=0.411838
2017-11-10 17:01:25.551299: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
slurmstepd: error: *** JOB 5538559 ON dgpu501-26-r CANCELLED AT 2017-11-10T17:47:10 ***

有什么建议吗?

0 个答案:

没有答案