我在AWS tf.contrib.cudnn_rnn.CudnnLSTM(1, 128, direction='bidirectional')
实例上使用批量大小为32的3 p2.xlarge
个图层。完全相同的配置适用于非渴望(标准)张量流。以下是错误日志:
2018-04-27 18:15:59.139739: E tensorflow/stream_executor/cuda/cuda_dnn.cc:1520] Failed to allocate RNN workspace of 74252288 bytes.
2018-04-27 18:15:59.139758: E tensorflow/stream_executor/cuda/cuda_dnn.cc:1697] Unable to create rnn workspace
Traceback (most recent call last):
File "tf_run_eager.py", line 424, in <module>
run_experiments()
File "tf_run_eager.py", line 417, in run_experiments
train_losses.append(model.optimize(bX, bY).numpy())
File "tf_run_eager.py", line 397, in optimize
loss, grads_and_vars = self.loss(phoneme_features, utterances)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 233, in grad_fn
sources)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/eager/imperative_grad.py", line 65, in imperative_grad
tape._tape, vspace, target, sources, output_gradients, status) # pylint: disable=protected-access
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 141, in grad_fn
op_inputs, op_outputs, orig_outputs)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 109, in _magic_gradient_function
return grad_fn(mock_op, *out_grads)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1609, in _cudnn_rnn_backward
direction=op.get_attr("direction"))
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/ops/gen_cudnn_rnn_ops.py", line 320, in cudnn_rnn_backprop
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnBackward [Op:CudnnRNNBackprop]