Question

我想在我的GPU上运行tensorflow代码，但它不起作用。我安装了Cuda和cuDNN，并且兼容GPU。

中获取了此示例

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

这是我的输出：

Device mapping: no known devices.
2017-10-31 16:15:40.298845: I tensorflow/core/common_runtime/direct_session.cc:300] Device mapping:

MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
2017-10-31 16:15:56.895802: I tensorflow/core/common_runtime/simple_placer.cc:872] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-10-31 16:15:56.895910: I tensorflow/core/common_runtime/simple_placer.cc:872] b: (Const)/job:localhost/replica:0/task:0/cpu:0
a_1: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-10-31 16:15:56.895961: I tensorflow/core/common_runtime/simple_placer.cc:872] a_1: (Const)/job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-10-31 16:15:56.896006: I tensorflow/core/common_runtime/simple_placer.cc:872] a: (Const)/job:localhost/replica:0/task:0/cpu:0
[[ 22.  28.]
 [ 49.  64.]]

我的GPU上无法运行。我尝试使用以下方法强制它在GPU上运行：

with tf.device('/gpu:0'):
...

它给出了一堆错误：

Traceback (most recent call last):
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1297, in _run_fn
    self._extend_graph()
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1358, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "/home/abhor/anaconda3/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'MatMul_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a_2, b_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'MatMul_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a_2, b_1)]]

Caused by op 'MatMul_1', defined at:
  File "<stdin>", line 4, in <module>
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1844, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1289, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'MatMul_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a_2, b_1)]]

我看到在某些行中它表示只有CPU可用。

以下是我的显卡详情和Cuda版本。

nvidia-smi的输出：

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 940MX       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   43C    P0    N/A /  N/A |    274MiB /  2002MiB |     10%      Default |
+-------------------------------+----------------------+----------------------+

nvcc -V

的输出

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

我不知道如何检查cuDNN，但我按官方文档中的方式安装它，所以我猜它应该也能正常工作。

编辑： pip3 list | grep tensorflow

的输出

tensorflow-gpu (1.3.0)
tensorflow-tensorboard (0.1.8)

Answer 1

尝试：

sess = tf.Session（config = tf.ConfigProto（ allow_soft_placement = True，log_device_placement = True））

Answer 2

通常，我的建议是使用conda环境。在这种情况下，您可以创建一个新的新鲜环境，并尝试从头开始安装tensorflow或任何其他工具，而无需重新安装整个操作系统。作为附加值，您可以在PC上拥有更多的环境

Answer 3

实际上，tensorflow无法在您遇到的情况下找到CUDA GPU。

请参考那里的输出设备列表：

Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]

这意味着未找到GPU。您可以从How to get current available GPUs in tensorflow?引用此处的代码，以列出GPU（张量流实际上可以找到）。

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

您必须确保返回实际找到的gpu / s，以便tensorflow可以使用gpu设备。

有很多找不到gpu的可能性，包括但不限于CUDA安装/设置，张量流版本和GPU模型，尤其是GPU计算能力。必须检出特定GPU模型的tensorflow版本支持，并且必须检出GPU功能（对于NVidia GPU）。

无法在GPU上运行tensorflow

3 个答案: