我正在使用这样的cudnnrnnrelu
:
with tf.variable_scope('cudnn_rnn_stack', reuse = reuse) as scope:
rnn = tf.contrib.cudnn_rnn.CudnnRNNRelu(5,n_hidden,"linear_input", "bidirectional")
output, _ = rnn(tf.transpose(layer_1,[1,0,2]), training=True)
output_rnn_stack = tf.concat(output, 2)
在gcp实例中的分布式gpu设置上运行它,epochs最初运行正常。但是我在2-3个时期之后遇到了这个错误:
InternalError (see above for traceback): Failed to call ThenRnnForward
[[Node: tower_0/cudnn_rnn_stack/cudnn_rnn_relu/CudnnRNN =
CudnnRNN[T=DT_FLOAT, direc tion="bidirectional", dropout=0,
input_mode="linear_input", is_training=true, rnn_mode="rnn_r elu",
seed=0, seed2=0,
device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_0/cudnn
rnn_stack/transpose, tower_0/cudnn_rnn_stack/cudnn_rnn_relu/zeros,
tower_0/cudnn_rnn_stack/cu dnn_rnn_relu/Const,
cudnn_rnn_stack/cudnn_rnn_relu/opaque_kernel/read)]] [[Node:
Adam/update_cudnn_rnn_stack/cudnn_rnn_relu/opaque_kernel/ApplyAdam/_870
= _R ecvclient_terminated=false, cv_device="/job:localhost/replica:0/task:0/device:CPU:0", send
_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_nam
e="edge_2200_Adam/update_cudnn_rnn_stack/cudnn_rnn_relu/opaque_kernel/ApplyAdam",
tensor_type
=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]