Tensorflow内存分配错误:Dst张量未初始化 - GPU上的大量内存

时间:2018-04-11 14:28:52

标签: python tensorflow machine-learning neural-network

当我开始使用tensorflow在新集群上使用GPU训练前馈2层ANN时,我会遇到Dst tensor is not initialized错误。当没有足够的内存来处理批量大小时,显然会出现此错误,但是错误表明分配器在尝试分配315.8KiB时内存不足...这应该不足以导致内存错误没有其他处理GPU上的记忆......

首先,nvidia-smi显示使用群集中的所有内存没有内存问题或其他进程:

Wed Apr 11 10:22:49 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:05:00.0  On |                  N/A |
| 22%   51C    P8    17W / 250W |     65MiB / 12204MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 00000000:06:00.0 Off |                  N/A |
| 22%   49C    P8    16W / 250W |     12MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 00000000:09:00.0 Off |                  N/A |
| 22%   48C    P8    16W / 250W |     12MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 22%   39C    P8    16W / 250W |     12MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2120      G   /usr/bin/X                                    53MiB |
+-----------------------------------------------------------------------------+

以下回溯:

W tensorflow/core/common_runtime/bfc_allocator.cc:274] *******************************************************xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 315.8KiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:983] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Internal: Dst tensor is not initialized.
         [[Node: zeros_2 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [40416,2] values: [0 0][0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/raid/home/vbalupuri/GDClassifier/scripts/workbench.py", line 52, in <module>
    batch_size=0.5
  File "/raid/home/vbalupuri/GDClassifier/scripts/workbench.py", line 15, in train_classifier
    mlp = ann.Trainer(ann.TwoLayerAnn(dr.xdim[0], dr.ydim[0], 0.1))
  File "gd_classifier/ann/train.py", line 14, in __init__
    self.sess.run(self.init)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
         [[Node: zeros_2 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [40416,2] values: [0 0][0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

InternalError (see above for traceback): Dst tensor is not initialized.
         [[Node: zeros_2 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [40416,2] values: [0 0][0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

0 个答案:

没有答案