当我开始使用tensorflow在新集群上使用GPU训练前馈2层ANN时,我会遇到Dst tensor is not initialized
错误。当没有足够的内存来处理批量大小时,显然会出现此错误,但是错误表明分配器在尝试分配315.8KiB时内存不足...这应该不足以导致内存错误没有其他处理GPU上的记忆......
首先,nvidia-smi
显示使用群集中的所有内存没有内存问题或其他进程:
Wed Apr 11 10:22:49 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:05:00.0 On | N/A |
| 22% 51C P8 17W / 250W | 65MiB / 12204MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 00000000:06:00.0 Off | N/A |
| 22% 49C P8 16W / 250W | 12MiB / 12207MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... Off | 00000000:09:00.0 Off | N/A |
| 22% 48C P8 16W / 250W | 12MiB / 12207MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... Off | 00000000:0A:00.0 Off | N/A |
| 22% 39C P8 16W / 250W | 12MiB / 12207MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2120 G /usr/bin/X 53MiB |
+-----------------------------------------------------------------------------+
以下回溯:
W tensorflow/core/common_runtime/bfc_allocator.cc:274] *******************************************************xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 315.8KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:983] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: zeros_2 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [40416,2] values: [0 0][0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/raid/home/vbalupuri/GDClassifier/scripts/workbench.py", line 52, in <module>
batch_size=0.5
File "/raid/home/vbalupuri/GDClassifier/scripts/workbench.py", line 15, in train_classifier
mlp = ann.Trainer(ann.TwoLayerAnn(dr.xdim[0], dr.ydim[0], 0.1))
File "gd_classifier/ann/train.py", line 14, in __init__
self.sess.run(self.init)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: zeros_2 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [40416,2] values: [0 0][0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: zeros_2 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [40416,2] values: [0 0][0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]