我正在尝试在运行其他实验的共享主机上使用TensorFlow。有时我碰到tensorflow.python.framework.errors_impl.InternalError
,但我尝试抓住了一个例外:
try:
with tf.Session() as sess:
...
except tensorflow.python.framework.errors_impl.InternalError as e:
...
不幸的是,这种方法似乎效果不佳,因为随后我得到了:
tensorflow.python.framework.errors_impl.InternalError: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_VALUE: invalid argument
...
During handling of the above exception, another exception occurred:
...
NameError: name 'tensorflow' is not defined
如何捕捉InternalError
并尝试在短时间内重新运行实验?