Question

我正在运行一个tensorflow工作，它总是在增加池大小时停滞不前，从不从那里开始。

这是输出：

2017-11-13 19:01:12.841317: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-13 19:01:12.841715: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-13 19:01:12.841729: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-13 19:01:17.941982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: Tesla K20m
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:09:00.0
Total memory: 4.63GiB
Free memory: 4.56GiB
2017-11-13 19:01:18.135538: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x6e48240 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-13 19:01:18.136394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties: 
name: Tesla K20m
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:0a:00.0
Total memory: 4.63GiB
Free memory: 4.56GiB
2017-11-13 19:01:18.324134: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x6e4a680 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-13 19:01:18.325028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 2 with properties: 
name: Tesla K20m
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:0d:00.0
Total memory: 4.63GiB
Free memory: 4.56GiB
2017-11-13 19:01:18.519043: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x60ae510 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-13 19:01:18.519928: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 3 with properties: 
name: Tesla K20m
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:0e:00.0
Total memory: 4.63GiB
Free memory: 4.56GiB
2017-11-13 19:01:18.521497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 2 3 
2017-11-13 19:01:18.521514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y Y Y Y 
2017-11-13 19:01:18.521523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   Y Y Y Y 
2017-11-13 19:01:18.521530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 2:   Y Y Y Y 
2017-11-13 19:01:18.521538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 3:   Y Y Y Y 
2017-11-13 19:01:18.521556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:09:00.0)
2017-11-13 19:01:18.521566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K20m, pci bus id: 0000:0a:00.0)
2017-11-13 19:01:18.521580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K20m, pci bus id: 0000:0d:00.0)
2017-11-13 19:01:18.521589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K20m, pci bus id: 0000:0e:00.0)
2017-11-13 19:01:24.197527: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2731 get requests, put_count=2675 evicted_count=1000 eviction_rate=0.373832 and unsatisfied allocation rate=0.423288
2017-11-13 19:01:24.197943: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110

当我获得GPU利用率时，内存几乎未被使用，在这里：

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 00000000:09:00.0 Off |                    0 |
| N/A   42C    P0    93W / 225W |    646MiB /  4742MiB |     30%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          Off  | 00000000:0A:00.0 Off |                    0 |
| N/A   33C    P0    43W / 225W |     72MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20m          Off  | 00000000:0D:00.0 Off |                    0 |
| N/A   35C    P0    45W / 225W |     72MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K20m          Off  | 00000000:0E:00.0 Off |                    0 |
| N/A   33C    P0    43W / 225W |     72MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K20m          Off  | 00000000:28:00.0 Off |                    0 |
| N/A   35C    P0    45W / 225W |      0MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K20m          Off  | 00000000:2B:00.0 Off |                    0 |
| N/A   37C    P0    45W / 225W |      0MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K20m          Off  | 00000000:30:00.0 Off |                    0 |
| N/A   38C    P0    45W / 225W |      0MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K20m          Off  | 00000000:33:00.0 Off |                    0 |
| N/A   32C    P0    46W / 225W |      0MiB /  4742MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1554      C   python                                       635MiB |
|    1      1554      C   python                                        61MiB |
|    2      1554      C   python                                        61MiB |
|    3      1554      C   python                                        61MiB |
+-----------------------------------------------------------------------------+

这是我的会话配置：

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.9)
config = tf.ConfigProto(allow_soft_placement=True, gpu_options=gpu_options)
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:

我缺少什么？与它一起使用的CPU也非常大，即

8个CPU，每个都有30G

我无法看到或理解为什么会这样。

Tensorflow分配更大的池，而GPU内存未完全使用。

0 个答案: