这个问题困扰了我很长时间。谢谢您的提前帮助!
尝试使用Tensorflow 2.x时,代码可以工作,但是如果没有@tf.function
,训练会非常缓慢。但是,当用@tf.function
装饰训练时,代码会中断并产生下面的回溯。
该代码太长,无法粘贴,下面是相关的代码语句。在经过修饰的@tf.function
中,我将python函数包装为AutoGraph操作的一部分。
loss = tf.py_function(func=calc_rl_loss, inp=[sample_sents, greedy_sents, targ_sents, ce_loss], Tout=tf.float32)
tf.print(loss)
表明loss
具有正确的标量值。
另一方面,与GradientTape相关的代码在这里
variables = encoder.trainable_variables + decoder.trainable_variables
#print("calling tape.gradient: ", variables)
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
注意的变量在这里
train_step_rl: after calling py_function calc_rl_loss
train_step_rl: watched variables: (<tf.Variable 'encoder/embedding/embeddings:0' shape=(24794, 128) dtype=float32>, <tf.Variable 'encoder/gru/gru_cell/kernel:0' shape=(128, 384) dtype=float32>, <tf.Variable 'encoder/gru/gru_cell/recurrent_kernel:0' shape=(128, 384) dtype=float32>, <tf.Variable 'encoder/gru/gru_cell/bias:0' shape=(2, 384) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_1/kernel:0' shape=(128, 128) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_1/bias:0' shape=(128,) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_2/kernel:0' shape=(128, 128) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_2/bias:0' shape=(128,) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_3/kernel:0' shape=(128, 1) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_3/bias:0' shape=(1,) dtype=float32>, <tf.Variable 'decoder/embedding_1/embeddings:0' shape=(12934, 128) dtype=float32>, <tf.Variable 'decoder/gru_1/gru_cell_1/kernel:0' shape=(256, 384) dtype=float32>, <tf.Variable 'decoder/gru_1/gru_cell_1/recurrent_kernel:0' shape=(128, 384) dtype=float32>, <tf.Variable 'decoder/gru_1/gru_cell_1/bias:0' shape=(2, 384) dtype=float32>, <tf.Variable 'decoder/dense/kernel:0' shape=(128, 12934) dtype=float32>, <tf.Variable 'decoder/dense/bias:0' shape=(12934,) dtype=float32>)
几个问题:
为什么会有此警告when calling GradientTape.gradient, got tf.string
?以及如何调试?如上所示,这些受监视的变量都没有dtype tf.string!
为什么会有TypeError: Cannot convert 0.0 to EagerTensor of dtype string
? tensorflow为什么将0.0 float
转换为tf.string
?
详细的追溯信息附在这里:
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.string
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.string
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.string
2020-08-18 14:43:43.171101: W tensorflow/core/framework/op_kernel.cc:1755] Invalid argument: TypeError: Cannot convert 0.0 to EagerTensor of dtype string
Traceback (most recent call last):
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 242, in __call__
return func(device, token, args)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 140, in __call__
outputs = [
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 141, in <listcomp>
_maybe_copy_to_context_device(self._convert(x, dtype=dtype),
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 119, in _convert
return constant_op.constant(0.0, dtype=dtype)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 263, in constant
return _constant_impl(value, dtype, shape, name, verify_shape=False,
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 275, in _constant_impl
return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_eager_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
TypeError: Cannot convert 0.0 to EagerTensor of dtype string
Traceback (most recent call last):
File "main.py", line 31, in <module>
run_rl(args)
File "/home/user/textgen/nmt_spanish_english/run_rl.py", line 392, in run_rl
train_rl(input_tensor_train, target_tensor_train, input_tensor_valid, target_tensor_valid, targ_lang, encoder, decoder, args.rl_log_dir)
File "/home/user/textgen/nmt_spanish_english/run_rl.py", line 270, in train_rl
batch_loss = train_step_rl(inp, targ, targ_sents, targ_lang, table, enc_hidden, optimizer, encoder, decoder, train_loss)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
return self._call_flat(
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
outputs = execute.execute(
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: Cannot convert 0.0 to EagerTensor of dtype string
Traceback (most recent call last):
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 242, in __call__
return func(device, token, args)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 140, in __call__
outputs = [
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 141, in <listcomp>
_maybe_copy_to_context_device(self._convert(x, dtype=dtype),
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 119, in _convert
return constant_op.constant(0.0, dtype=dtype)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 263, in constant
return _constant_impl(value, dtype, shape, name, verify_shape=False,
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 275, in _constant_impl
return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_eager_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
TypeError: Cannot convert 0.0 to EagerTensor of dtype string