我想我不能在tensorflow夜间gpu构建上编译Adam优化器

时间:2018-03-23 03:05:46

标签: python tensorflow keras

有谁知道这个输出发生了什么?我很难从中辨别问题,但看起来它可能是adam优化器?

我在cpad 9.0上使用tf-nightly-gpu,在Windows桌面上使用带有gtx 1080n的CudNN 7.0。

代码在我的linux笔记本电脑上运行(非常慢),使用tf-gpu 1.7.0rc0和相同的CUDA和CuDNN版本。 GPU是Quadro K1000M

这是错误输出(警告长):

(tensorflow) C:\Users\Grant\Documents\GitHub\MLAPM>python bdgru_anomaly.py
Using TensorFlow backend.
Loading data...
wave1 1000
wave2 1000
wave3 50
Length of Data 1000
Creating train data...
Mean of train data :  -0.00142787279724
Train data shape  :  (600, 100)
X shape: (600, 99)
y shape: (600,)
Creating test data...
Mean of test data :  0.0165107915353
Test data shape  :  (400, 100)
Shape X_train (3049, 99)
Shape X_test (400, 99)

Data Loaded. Compiling...

Compilation Time :  0.018500089645385742
Training...
Train on 2896 samples, validate on 153 samples
Epoch 1/1
2018-03-22 21:54:52.332961: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2018-03-22 21:54:52.341683: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-22 21:55:26.674106: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-22 21:55:26.680619: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:917]      0
2018-03-22 21:55:26.685158: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0:   N
2018-03-22 21:55:26.691199: I C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6372 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-03-22 21:55:33.481279: E C:\tf_jenkins\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\executor.cc:644] Executor failed to create kernel. Not found: No registered 'Snapshot' OpKernel for GPU devices compatible with node training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)
        .  Registered:  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_UINT16]
  device='CPU'; T in [DT_INT16]
  device='CPU'; T in [DT_UINT8]
  device='CPU'; T in [DT_INT8]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='CPU'; T in [DT_BOOL]

         [[Node: training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)]]
Traceback (most recent call last):
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_call
    return fn(*args)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1313, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1421, in _call_tf_sessionrun
    status, run_metadata)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'Snapshot' OpKernel for GPU devices compatible with node training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)
        .  Registered:  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_UINT16]
  device='CPU'; T in [DT_INT16]
  device='CPU'; T in [DT_UINT8]
  device='CPU'; T in [DT_INT8]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='CPU'; T in [DT_BOOL]

         [[Node: training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bdgru_anomaly.py", line 216, in <module>
    run_network()
  File "bdgru_anomaly.py", line 172, in run_network
    batch_size=batch_size, epochs=epochs, validation_split=0.05)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\models.py", line 963, in fit
    validation_steps=validation_steps)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1705, in fit
    validation_steps=validation_steps)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1235, in _fit_loop
    outs = f(ins_batch)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 2478, in __call__
    **self.session_kwargs)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 906, in run
    run_metadata_ptr)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1141, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_run
    run_metadata)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'Snapshot' OpKernel for GPU devices compatible with node training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)
        .  Registered:  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_UINT16]
  device='CPU'; T in [DT_INT16]
  device='CPU'; T in [DT_UINT8]
  device='CPU'; T in [DT_INT8]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='CPU'; T in [DT_BOOL]

         [[Node: training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)]]

Caused by op 'training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv', defined at:
  File "bdgru_anomaly.py", line 216, in <module>
    run_network()
  File "bdgru_anomaly.py", line 172, in run_network
    batch_size=batch_size, epochs=epochs, validation_split=0.05)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\models.py", line 963, in fit
    validation_steps=validation_steps)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1682, in fit
    self._make_train_function()
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 990, in _make_train_function
    loss=self.total_loss)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\optimizers.py", line 445, in get_updates
    grads = self.get_gradients(loss, params)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\optimizers.py", line 78, in get_gradients
    grads = K.gradients(loss, params)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 2515, in gradients
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 488, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 625, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 379, in _MaybeCompile
    return grad_fn()  # Exit early
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 625, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_grad.py", line 132, in _MeanGrad
    math_ops.reduce_prod(input_shape), math_ops.reduce_prod(output_shape))
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_grad.py", line 35, in _safe_shape_div
    return x // math_ops.maximum(y, 1)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 974, in binary_op_wrapper
    return func(x, y, name=name)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1186, in floordiv
    return gen_math_ops.floor_div(x, y, name=name)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 3003, in floor_div
    "FloorDiv", x=x, y=y, name=name)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3305, in create_op
    op_def=op_def)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1669, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op 'loss/dense_5_loss/Mean_3', defined at:
  File "bdgru_anomaly.py", line 216, in <module>
    run_network()
  File "bdgru_anomaly.py", line 166, in run_network
    model = build_model()
  File "bdgru_anomaly.py", line 148, in build_model
    model.compile(loss="mse", optimizer="adam")
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\models.py", line 824, in compile
    **kwargs)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 830, in compile
    sample_weight, mask)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 447, in weighted
    return K.mean(score_array)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 1367, in mean
    return tf.reduce_mean(x, axis, keepdims)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1557, in reduce_mean
    name=name))
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4717, in mean
    name=name)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3305, in create_op
    op_def=op_def)
  File "C:\Users\Grant\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1669, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): No registered 'Snapshot' OpKernel for GPU devices compatible with node training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)
        .  Registered:  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_UINT16]
  device='CPU'; T in [DT_INT16]
  device='CPU'; T in [DT_UINT8]
  device='CPU'; T in [DT_INT8]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='CPU'; T in [DT_BOOL]

         [[Node: training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/floordiv = Snapshot[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/dense_5_loss/Mean_3_grad/Prod/_343)]]

1 个答案:

答案 0 :(得分:0)

我追查了这个问题。看起来Windows用户使用GPU的每晚构建存在问题。

在此github问题中进行讨论(并希望在发布时进行修复):https://github.com/tensorflow/tensorflow/issues/17752

临时修复时间为pip uninstall tf-nightly-gpu,然后是pip install tensorflow==1.6.0