我正在尝试执行从以下github存储库克隆的脚本“ train.py”:
https://github.com/xiaojunxu/dnn-binary-code-similarity
安装完上述存储库(requirements.txt)的所有要求之后,我运行“ train.py”并收到以下错误,找不到解决方案:
2019-09-17 20:43:51.186970: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Traceback (most recent call last):
File "train.py", line 124, in <module>
gnn.init(LOAD_PATH, LOG_PATH)
File "/ws/Gemini/graphnnSiamese.py", line 120, in init
sess.run(tf.global_variables_initializer())
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'Variable_4/Adam_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: Variable_4/Adam_1 = VariableV2[_class=["loc:@Variable_4"], container="", dtype=DT_FLOAT, shape=[64], shared_name="", _device="/device:GPU:0"]()]]
Caused by op u'Variable_4/Adam_1', defined at:
File "train.py", line 122, in <module>
lr = LEARNING_RATE
File "/ws/Gemini/graphnnSiamese.py", line 93, in __init__
optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 353, in minimize
name=name)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 474, in apply_gradients
self._create_slots([_get_variable_for(v) for v in var_list])
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 137, in _create_slots
self._zeros_slot(v, "v", self._name)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 796, in _zeros_slot
named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 148, in create_slot_with_initializer
dtype)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 67, in _create_slot_var
validate_shape=validate_shape)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1203, in get_variable
constraint=constraint)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1092, in get_variable
constraint=constraint)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
constraint=constraint)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 805, in _get_single_variable
constraint=constraint)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 213, in __init__
constraint=constraint)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 309, in _init_from_args
name=name)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 133, in variable_op_v2
shared_name=shared_name)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 927, in _variable_v2
shared_name=shared_name, name=name)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/root/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable_4/Adam_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: Variable_4/Adam_1 = VariableV2[_class=["loc:@Variable_4"], container="", dtype=DT_FLOAT, shape=[64], shared_name="", _device="/device:GPU:0"]()]]
我在建议中发现应该尝试将以下内容更改为“ 0”:
os.environ["CUDA_VISIBLE_DEVICES"]= "0"
但这对我不起作用。
如果有人可以帮助我解决此问题,我将不胜感激。谢谢。
答案 0 :(得分:0)
无法为操作“ Variable_4 / Adam_1”分配设备:操作 已明确分配给/ device:GPU:0,但可用设备为[ / job:localhost /副本:0 /任务:0 /设备:CPU:0]。确保设备 规范指的是有效设备
您是否安装了tensorflow
或tensorflow-gpu
?如果要使用GPU,则是后者。
请执行以下代码以验证GPU的可用性
tf.config.list_physical_devices('GPU')
执行上述命令后,您将收到与我的输出类似的输出,如下所述
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
这也可能是版本兼容性问题。首先,检查您的nvidia驱动程序是否安装有:nvidia-smi
,您应该得到类似以下的内容:
Wed Jun 10 15:13:03 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 54C P0 36W / 250W | 1573MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
然后,使用nvcc --version
检查您拥有的cuda版本。示例:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
最后,检查是否已安装python / tensorflow / cuda的兼容版本。因此,对于大多数人来说,使用this作为参考似乎是可行的。
安装驱动程序后,别忘了重新启动!