要测试我的tensorflow安装,我使用tensorflow存储库中提供的mnist示例,但是当我执行convolutional.py脚本时,我有这个输出:
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.2405
pciBusID 0000:03:00.0
Total memory: 5.93GiB
Free memory: 5.83GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x29020c0
E tensorflow/core/common_runtime/direct_session.cc:137] Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE
Traceback (most recent call last):
File "convolutional.py", line 339, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "convolutional.py", line 284, in main
with tf.Session() as sess:
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1187, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 552, in __init__
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
我的第一个想法是,我可能在cuda安装方面遇到了问题,但我使用了为nvidia提供的一个示例进行了测试。在这种情况下,我使用了这个例子:
NVIDIA_CUDA-8.0_Samples / 6_Advanced / C ++ 11_cuda
输出是这样的:
GPU Device 0: "GeForce GTX 980 Ti" with compute capability 5.2
Read 3223503 byte corpus from ./warandpeace.txt
counted 107310 instances of 'x', 'y', 'z', or 'w' in "./warandpeace.txt"
然后我的结论是cuda安装正确。但我不知道这里发生了什么。如果有人可以帮助我,我将不胜感激。
有关更多信息,这是我的gpu配置:
Tue Jan 31 19:42:10 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 560 Ti Off | 0000:01:00.0 N/A | N/A |
| 25% 45C P0 N/A / N/A | 463MiB / 958MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 980 Ti Off | 0000:03:00.0 Off | N/A |
| 0% 31C P8 13W / 280W | 1MiB / 6077MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
修改
这两个nvidia卡具有相同的物理ID是正常的吗?
sudo lshw -C "display"
*-display
description: VGA compatible controller
product: GM200 [GeForce GTX 980 Ti]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:03:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:50 memory:f9000000-f9ffffff memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:d000(size=128) memory:fa000000-fa07ffff
*-display
description: VGA compatible controller
product: GF114 [GeForce GTX 560 Ti]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:45 memory:f6000000-f7ffffff memory:c8000000-cfffffff memory:d0000000-d3ffffff ioport:e000(size=128) memory:f8000000-f807ffff
答案 0 :(得分:4)
您所显示的输出中的重点是:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.2405
pciBusID 0000:03:00.0
Total memory: 5.93GiB
Free memory: 5.83GiB
即。您想要的计算设备枚举为设备0和
E tensorflow/core/common_runtime/direct_session.cc:137] Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE
即。产生错误的计算设备被枚举为设备1.设备1是您的显示GPU,不能用于Tensorflow中的计算。如果您将该设备标记为nvidia-smi
禁止的计算,或使用CUDA_VISIBLE_DEVICES
环境变量仅使您的计算设备对CUDA可见,则该错误可能会消失。
答案 1 :(得分:0)
当我尝试运行属于the image recognition tutorial的classify_image.py
脚本时,我遇到了类似的错误。由于我已经运行了一些Python会话(elpy),我在其中运行了一些TensorFlow代码,因此GPU被分配在那里,因此我无法使用我尝试从shell运行的脚本。
退出现有的Python会话解决了错误。