无法为操作“ Variable_4 / Adam_1”分配设备

时间:2019-09-17 20:55:37

标签: python-2.7 tensorflow

我正在尝试执行从以下github存储库克隆的脚本“ train.py”:

https://github.com/xiaojunxu/dnn-binary-code-similarity

安装完上述存储库(requirements.txt)的所有要求之后,我运行“ train.py”并收到以下错误,找不到解决方案:

2019-09-17 20:43:51.186970: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
  Traceback (most recent call last):
    File "train.py", line 124, in <module>
      gnn.init(LOAD_PATH, LOG_PATH)
    File "/ws/Gemini/graphnnSiamese.py", line 120, in init
      sess.run(tf.global_variables_initializer())
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
      run_metadata_ptr)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
      feed_dict_tensor, options, run_metadata)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
      options, run_metadata)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
      raise type(e)(node_def, op, message)
  tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'Variable_4/Adam_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
     [[Node: Variable_4/Adam_1 = VariableV2[_class=["loc:@Variable_4"], container="", dtype=DT_FLOAT, shape=[64], shared_name="", _device="/device:GPU:0"]()]]

  Caused by op u'Variable_4/Adam_1', defined at:
    File "train.py", line 122, in <module>
      lr = LEARNING_RATE
    File "/ws/Gemini/graphnnSiamese.py", line 93, in __init__
      optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 353, in minimize
      name=name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 474, in apply_gradients
      self._create_slots([_get_variable_for(v) for v in var_list])
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 137, in _create_slots
      self._zeros_slot(v, "v", self._name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 796, in _zeros_slot
      named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
      colocate_with_primary=colocate_with_primary)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 148, in create_slot_with_initializer
      dtype)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 67, in _create_slot_var
      validate_shape=validate_shape)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1203, in get_variable
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1092, in get_variable
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
      use_resource=use_resource, constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 805, in _get_single_variable
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 213, in __init__
      constraint=constraint)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 309, in _init_from_args
      name=name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 133, in variable_op_v2
      shared_name=shared_name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 927, in _variable_v2
      shared_name=shared_name, name=name)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
      op_def=op_def)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
      op_def=op_def)
    File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
      self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

  InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable_4/Adam_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
     [[Node: Variable_4/Adam_1 = VariableV2[_class=["loc:@Variable_4"], container="", dtype=DT_FLOAT, shape=[64], shared_name="", _device="/device:GPU:0"]()]]

我在建议中发现应该尝试将以下内容更改为“ 0”:

os.environ["CUDA_VISIBLE_DEVICES"]= "0"

但这对我不起作用。

如果有人可以帮助我解决此问题,我将不胜感激。谢谢。

1 个答案:

答案 0 :(得分:0)

无法为操作“ Variable_4 / Adam_1”分配设备:操作 已明确分配给/ device:GPU:0,但可用设备为[ / job:localhost /副本:0 /任务:0 /设备:CPU:0]。确保设备 规范指的是有效设备

您是否安装了tensorflowtensorflow-gpu?如果要使用GPU,则是后者。

请执行以下代码以验证GPU的可用性

tf.config.list_physical_devices('GPU') 

执行上述命令后,您将收到与我的输出类似的输出,如下所述

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

这也可能是版本兼容性问题。首先,检查您的nvidia驱动程序是否安装有:nvidia-smi,您应该得到类似以下的内容:

Wed Jun 10 15:13:03 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0    36W / 250W |   1573MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

然后,使用nvcc --version检查您拥有的cuda版本。示例:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

最后,检查是否已安装python / tensorflow / cuda的兼容版本。因此,对于大多数人来说,使用this作为参考似乎是可行的。

安装驱动程序后,别忘了重新启动!