在训练中提出CUDA_ERROR_LAUNCH_FAILED

时间:2019-03-18 12:10:14

标签: python tensorflow cuda

我在训练过程中发生了这个CUDA_ERROR_LAUNCH_FAILED错误:

2019-03-17 10:22:48.713784: E tensorflow/stream_executor/cuda/cuda_driver.cc:1011] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure :: 
2019-03-17 10:22:48.794915: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.795193: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.795459: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.795726: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.795993: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.796259: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.796526: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.796791: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.797057: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.797322: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-17 10:22:48.797590: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 00000179420B8DC0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
Traceback (most recent call last):
  File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
    return fn(*args)
  File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
    run_metadata_ptr)
  File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
    run_metadata)
  File "C:\Users\9973879\AppData\Local\Continuum\miniconda3\envs\tf\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

关于tensorflow脚本发生此错误有一些问题,但是大多数问题报告启动脚本时发生的错误。我的错误是在成功训练超过一天后发生的。另一个question报告说,这是由于会话已暂停,此处不是这种情况。

您知道可能会发生什么或如何调查此问题吗?

配置:

  • Windows 10
  • CUDA 10.0.130
  • TF 1.13.1

0 个答案:

没有答案