我正在尝试运行GitHub存储库“ Face-Aging-CAAE”, https://github.com/ZZUTK/Face-Aging-CAAE 该代码在我的CPU上运行(大约需要3天),但是在GPU上,它在执行session.run()时终止,并且没有错误输出。
此处,代码在GPU上运行,并在创建“初始模型”时结束运行:
In [1]: runfile('/media/.../face-aging-caae/Face-Aging-CAAE-master/main.py', wdir='/media/.../face-aging-caae/Face-Aging-CAAE-master')
Namespace(dataset='UTKFace', epoch=50, is_train=True, savedir='save', testdir='None', use_init_model=True, use_trained_model=True)
Building graph ...
WARNING:tensorflow:From /home/.../anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Training Mode
Loading pre-trained model ...
FAILED >_<!
Loading init model ...
INFO:tensorflow:Restoring parameters from init_model/model-init
In [1]:
代码在执行“ FaceAging.py”上的该块期间退出:
# update
_, _, _, EG_err, Ez_err, Dz_err, Dzp_err, Gi_err, DiG_err, Di_err, TV = self.session.run(
fetches = [
self.EG_optimizer,
self.D_z_optimizer,
self.D_img_optimizer,
self.EG_loss,
self.E_z_loss,
self.D_z_loss_z,
self.D_z_loss_prior,
self.G_img_loss,
self.D_img_loss_G,
self.D_img_loss_input,
self.tv_loss
],
feed_dict={
self.input_image: batch_images,
self.age: batch_label_age,
self.gender: batch_label_gender,
self.z_prior: batch_z_prior
}
)
系统:
GPU在此环境下可以与我测试过的其他简单代码一起使用。
我试图在GPU上显式运行代码
with tf.device('/gpu:0'):
tf.app.run()
但是它给出了错误(错误再次消失,并且在“允许软放置”之后代码返回到先前的行为):
InvalidArgumentError: Cannot assign a device for operation 'global_step': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
AssignAdd: CPU
Const: GPU CPU
Assign: CPU
VariableV2: CPU
Identity: CPU
Colocation members and user-requested devices:
global_step (VariableV2) /device:GPU:0
global_step/read (Identity) /device:GPU:0
global_step/Assign (Assign) /device:GPU:0
opt/Adam/value (Const) /device:GPU:0
opt/Adam (AssignAdd) /device:GPU:0
Registered kernels:
device='CPU'
device='GPU'; dtype in [DT_INT64]
device='GPU'; dtype in [DT_DOUBLE]
device='GPU'; dtype in [DT_FLOAT]
device='GPU'; dtype in [DT_HALF]
[[Node: global_step = VariableV2[container="", dtype=DT_INT32, shape=[], shared_name="", _device="/device:GPU:0"]()]]
Caused by op u'global_step', defined at:
File "/home/.../anaconda3/envs/py27/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/.../anaconda3/envs/py27/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/spyder_kernels/console/__main__.py", line 11, in <module>
start.main()
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/spyder_kernels/console/start.py", line 310, in main
kernel.start()
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 499, in start
self.io_loop.start()
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tornado/ioloop.py", line 1073, in start
handler_func(fd_obj, events)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 456, in _handle_events
self._handle_recv()
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 486, in _handle_recv
self._run_callback(callback, msg)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 438, in _run_callback
callback(*args, **kwargs)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2714, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2824, in run_ast_nodes
if self.run_code(code, result):
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2878, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-83c713e248d3>", line 1, in <module>
runfile('/home/.../face-aging-caae/Face-Aging-CAAE-master/main.py', wdir='/home/.../face-aging-caae/Face-Aging-CAAE-master')
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 102, in execfile
builtins.execfile(filename, *where)
File "/home/.../face-aging-caae/Face-Aging-CAAE-master/main.py", line 70, in <module>
tf.app.run()
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/.../face-aging-caae/Face-Aging-CAAE-master/main.py", line 59, in main
use_init_model=FLAGS.use_init_model
File "FaceAging.py", line 208, in train
self.EG_global_step = tf.Variable(0, trainable=False, name='global_step')
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 235, in __init__
constraint=constraint)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 365, in _init_from_args
name=name)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 135, in variable_op_v2
shared_name=shared_name)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 1131, in variable_v2
shared_name=shared_name, name=name)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/home/.../anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'global_step': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
AssignAdd: CPU
Const: GPU CPU
Assign: CPU
VariableV2: CPU
Identity: CPU
Colocation members and user-requested devices:
global_step (VariableV2) /device:GPU:0
global_step/read (Identity) /device:GPU:0
global_step/Assign (Assign) /device:GPU:0
opt/Adam/value (Const) /device:GPU:0
opt/Adam (AssignAdd) /device:GPU:0
Registered kernels:
device='CPU'
device='GPU'; dtype in [DT_INT64]
device='GPU'; dtype in [DT_DOUBLE]
device='GPU'; dtype in [DT_FLOAT]
device='GPU'; dtype in [DT_HALF]
[[Node: global_step = VariableV2[container="", dtype=DT_INT32, shape=[], shared_name="", _device="/device:GPU:0"]()]]
我是TensorFlow初学者。而且,如果在弱GPU上运行此类代码时有什么需要考虑的,请告诉我。
谢谢。
答案 0 :(得分:0)
使用VSCode,此消息出现:
Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) Aborted (core dumped)
我检查了兼容性,发现该版本的tf需要cudnn 7.3,与此表相对应: https://www.tensorflow.org/install/source#tested_build_configurations
我将cudnn降级为7.0.5,并且代码运行没有问题(分别为7h和7m)。