我正在尝试在可通过SSH远程访问的GPU上运行张量流代码。我使用Windows CMD到SSH,然后获得服务器的Linux终端。现在我想在服务器的GPU而不是CPU上运行代码,因此我安装了Tensorflow-GPU。我正在使用conda环境来运行python。现在,当我启动python并导入tensorflow之后,我得到了以下错误。请帮我解决这个问题?
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-06-05 13:26:45.280912: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-05 13:26:45.309892: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3399505000 Hz
2019-06-05 13:26:45.311731: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ffc55f7cc0 executing computations on platform Host. Devices:
2019-06-05 13:26:45.311780: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-06-05 13:26:45.315413: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-06-05 13:26:46.043595: W tensorflow/compiler/xla/service/platform_util.cc:256] unable to create StreamExecutor for CUDA:3: failed initializing StreamExecutor for CUDA device ordinal 3: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 11719409664
2019-06-05 13:26:46.044237: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ffc84a7710 executing computations on platform CUDA. Devices:
2019-06-05 13:26:46.044297: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-06-05 13:26:46.044308: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-06-05 13:26:46.044322: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/media/data_dump_1/group_9/anaconda3/envs/pradyumnaenv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1570, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/media/data_dump_1/group_9/anaconda3/envs/pradyumnaenv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 693, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid device ordinal value (3). Valid range is [0, 2].
while setting up XLA_GPU_JIT device number 3
>>> sess=tf.Session()
2019-06-05 13:27:09.128360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:0a:00.0
2019-06-05 13:27:09.130001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:0b:00.0
2019-06-05 13:27:09.131300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:41:00.0
2019-06-05 13:27:09.132226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:42:00.0
2019-06-05 13:27:09.133262: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-05 13:27:09.135097: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-05 13:27:09.385582: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-05 13:27:09.486764: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-05 13:27:09.489289: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-05 13:27:10.140611: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-05 13:27:10.145133: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-05 13:27:10.159615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0, 1, 2, 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/media/data_dump_1/group_9/anaconda3/envs/pradyumnaenv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1570, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/media/data_dump_1/group_9/anaconda3/envs/pradyumnaenv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 693, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid device ordinal value (3). Valid range is [0, 2].
while setting up XLA_GPU_JIT device number 3
>>> exit()
以下是GPU详细信息-
(pradyumnaenv) cse563@falcon:~$ nvidia-smi
Wed Jun 5 15:53:05 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54 Driver Version: 396.54 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:0A:00.0 Off | N/A |
| 0% 44C P8 19W / 250W | 10791MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:0B:00.0 Off | N/A |
| 22% 53C P8 21W / 250W | 10791MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:41:00.0 Off | N/A |
| 31% 57C P8 22W / 250W | 677MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:42:00.0 Off | N/A |
|100% 91C P2 132W / 250W | 11105MiB / 11176MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 78117 C python 10777MiB |
| 1 47128 C python 10777MiB |
| 2 79606 C ...t_nagpal/miniconda3/envs/dnn/bin/python 667MiB |
| 3 83393 C python 11095MiB |
+-----------------------------------------------------------------------------+
答案 0 :(得分:0)
看起来正在访问的GPU ID是3。但是,其他GPU ID(0到2)似乎是空闲的。
您可以添加以下行以使tensorflow_gpu使用这些GPU_BUS_ID。
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0" #(or "1" or "2")