当我尝试打开tensorflow会话时,出现以下错误:
2017-09-24 10:49:20.526121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.342
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.87GiB
2017-09-24 10:49:20.599629: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x3dcf7e0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-09-24 10:49:20.599947: E tensorflow/core/common_runtime/direct_session.cc:171] Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/python-envs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1486, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/home/user/python-envs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 621, in __init__
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/home/user/python-envs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
我的系统中有两个GPUS。一个用于显示,另一个用于计算:
GPU0 (display) : Nvidia NVS 310
GPU1 (compute) : Nvidia Geforce GTX 970
Graphics Driver: 384.66
CUDA version : 8
cuDNN version : v6 for CUDA 8 (April 27, 2017)
Operating Sys. : Ubuntu 16.04
还有其他人有这个问题吗?如何进行调试/修复?
注意:我确实尝试在Github上打开一个问题。但在我结束之前,我被要求寻找早先在SO上提出的问题或者在那里问。
谢谢!
答案 0 :(得分:0)
似乎tensorflow试图抓住所有可用的GPU进行计算,如下面链接的Github问题所示。将环境变量CUDA_VISIBLE_DEVICES设置为我想用于计算的设备就可以了。
Github上可能存在的相关问题包括:Segmentation fault when GPUs are already used
可以通过运行nvidia-smi
实用程序检查Ubuntu上的设备ID。