我在训练过程中发生了这个CUDA_ERROR_LAUNCH_FAILED
错误:
2019-03-17 10:22:48.713784: E tensorflow/stream_executor/cuda/cuda_driver.cc:1011] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure ::
2019-03-17 10:22:48.794915: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.795193: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.795459: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.795726: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.795993: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.796259: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.796526: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.796791: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.797057: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.797322: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.797590: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
Traceback (most recent call last):
File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
关于tensorflow脚本发生此错误有一些问题,但是大多数问题报告启动脚本时发生的错误。我的错误是在成功训练超过一天后发生的。另一个question报告说,这是由于会话已暂停,此处不是这种情况。
您知道可能会发生什么或如何调查此问题吗?
配置: