有谁知道这个输出发生了什么?我很难从中辨别问题,但看起来它可能是adam优化器?
我在cpad 9.0上使用tf-nightly-gpu,在Windows桌面上使用带有gtx 1080n的CudNN 7.0。
代码在我的linux笔记本电脑上运行(非常慢),使用tf-gpu 1.7.0rc0和相同的CUDA和CuDNN版本。 GPU是Quadro K1000M
这是错误输出(警告长):
(tensorflow) C:\Users\Grant\Documents\GitHub\MLAPM>python bdgru_anomaly.py
Using TensorFlow backend.
Loading data...
wave1 1000
wave2 1000
wave3 50
Length of Data 1000
Creating train data...
Mean of train data : -0.00142787279724
Train data shape : (600, 100)
X shape: (600, 99)
y shape: (600,)
Creating test data...
Mean of test data : 0.0165107915353
Test data shape : (400, 100)
Shape X_train (3049, 99)
Shape X_test (400, 99)
Data Loaded. Compiling...
Compilation Time : 0.018500089645385742
Training...
Train on 2896 samples, validate on 153 samples
Epoch 1/1
2018-03-22 21:54:52.332961: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2018-03-22 21:54:52.341683: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-22 21:55:26.674106: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-22 21:55:26.680619: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0
2018-03-22 21:55:26.685158: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N
2018-03-22 21:55:26.691199: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6372 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-03-22 21:55:33.481279: E C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\executor.cc:644] Executor failed to create kernel. Not found: No registered 'Snapshot' OpKernel for GPU devices compatible with node training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)
. Registered: device='CPU'; T in [DT_INT64]
device='CPU'; T in [DT_INT32]
device='CPU'; T in [DT_UINT16]
device='CPU'; T in [DT_INT16]
device='CPU'; T in [DT_UINT8]
device='CPU'; T in [DT_INT8]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_BOOL]
[[Node: training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)]]
Traceback (most recent call last):
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_call
return fn(*args)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1313, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1421, in _call_tf_sessionrun
status, run_metadata)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'Snapshot' OpKernel for GPU devices compatible with node training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)
. Registered: device='CPU'; T in [DT_INT64]
device='CPU'; T in [DT_INT32]
device='CPU'; T in [DT_UINT16]
device='CPU'; T in [DT_INT16]
device='CPU'; T in [DT_UINT8]
device='CPU'; T in [DT_INT8]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_BOOL]
[[Node: training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bdgru_anomaly.py", line 216, in <module>
run_network()
File "bdgru_anomaly.py", line 172, in run_network
batch_size=batch_size, epochs=epochs, validation_split=0.05)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\models.py", line 963, in fit
validation_steps=validation_steps)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1705, in fit
validation_steps=validation_steps)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1235, in _fit_loop
outs = f(ins_batch)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 2478, in __call__
**self.session_kwargs)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 906, in run
run_metadata_ptr)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1141, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_run
run_metadata)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'Snapshot' OpKernel for GPU devices compatible with node training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)
. Registered: device='CPU'; T in [DT_INT64]
device='CPU'; T in [DT_INT32]
device='CPU'; T in [DT_UINT16]
device='CPU'; T in [DT_INT16]
device='CPU'; T in [DT_UINT8]
device='CPU'; T in [DT_INT8]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_BOOL]
[[Node: training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)]]
Caused by op 'training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv', defined at:
File "bdgru_anomaly.py", line 216, in <module>
run_network()
File "bdgru_anomaly.py", line 172, in run_network
batch_size=batch_size, epochs=epochs, validation_split=0.05)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\models.py", line 963, in fit
validation_steps=validation_steps)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1682, in fit
self._make_train_function()
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 990, in _make_train_function
loss=self.total_loss)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\optimizers.py", line 445, in get_updates
grads = self.get_gradients(loss, params)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\optimizers.py", line 78, in get_gradients
grads = K.gradients(loss, params)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 2515, in gradients
return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 488, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 625, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 379, in _MaybeCompile
return grad_fn() # Exit early
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 625, in <lambda>
lambda: grad_fn(op, *out_grads))
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_grad.py", line 132, in _MeanGrad
math_ops.reduce_prod(input_shape), math_ops.reduce_prod(output_shape))
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_grad.py", line 35, in _safe_shape_div
return x // math_ops.maximum(y, 1)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 974, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1186, in floordiv
return gen_math_ops.floor_div(x, y, name=name)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 3003, in floor_div
"FloorDiv", x=x, y=y, name=name)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3305, in create_op
op_def=op_def)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1669, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op 'loss/dense_5_loss/Mean_3', defined at:
File "bdgru_anomaly.py", line 216, in <module>
run_network()
File "bdgru_anomaly.py", line 166, in run_network
model = build_model()
File "bdgru_anomaly.py", line 148, in build_model
model.compile(loss="mse", optimizer="adam")
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\models.py", line 824, in compile
**kwargs)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 830, in compile
sample_weight, mask)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 447, in weighted
return K.mean(score_array)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 1367, in mean
return tf.reduce_mean(x, axis, keepdims)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1557, in reduce_mean
name=name))
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4717, in mean
name=name)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3305, in create_op
op_def=op_def)
File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1669, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): No registered 'Snapshot' OpKernel for GPU devices compatible with node training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)
. Registered: device='CPU'; T in [DT_INT64]
device='CPU'; T in [DT_INT32]
device='CPU'; T in [DT_UINT16]
device='CPU'; T in [DT_INT16]
device='CPU'; T in [DT_UINT8]
device='CPU'; T in [DT_INT8]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_BOOL]
[[Node: training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)]]
答案 0 :(得分:0)
我追查了这个问题。看起来Windows用户使用GPU的每晚构建存在问题。
在此github问题中进行讨论(并希望在发布时进行修复):https://github.com/tensorflow/tensorflow/issues/17752
临时修复时间为pip uninstall tf-nightly-gpu
,然后是pip install tensorflow==1.6.0
。