Question

这个问题困扰了我很长时间。谢谢您的提前帮助！

尝试使用Tensorflow 2.x时，代码可以工作，但是如果没有@tf.function，训练会非常缓慢。但是，当用@tf.function装饰训练时，代码会中断并产生下面的回溯。

该代码太长，无法粘贴，下面是相关的代码语句。在经过修饰的@tf.function中，我将python函数包装为AutoGraph操作的一部分。

loss = tf.py_function(func=calc_rl_loss, inp=[sample_sents, greedy_sents, targ_sents, ce_loss], Tout=tf.float32)

tf.print(loss)表明loss具有正确的标量值。

另一方面，与GradientTape相关的代码在这里

variables = encoder.trainable_variables + decoder.trainable_variables
#print("calling tape.gradient: ", variables)
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))

注意的变量在这里

train_step_rl: after calling py_function calc_rl_loss
train_step_rl: watched variables:  (<tf.Variable 'encoder/embedding/embeddings:0' shape=(24794, 128) dtype=float32>, <tf.Variable 'encoder/gru/gru_cell/kernel:0' shape=(128, 384) dtype=float32>, <tf.Variable 'encoder/gru/gru_cell/recurrent_kernel:0' shape=(128, 384) dtype=float32>, <tf.Variable 'encoder/gru/gru_cell/bias:0' shape=(2, 384) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_1/kernel:0' shape=(128, 128) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_1/bias:0' shape=(128,) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_2/kernel:0' shape=(128, 128) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_2/bias:0' shape=(128,) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_3/kernel:0' shape=(128, 1) dtype=float32>, <tf.Variable 'decoder/bahdanau_attention/dense_3/bias:0' shape=(1,) dtype=float32>, <tf.Variable 'decoder/embedding_1/embeddings:0' shape=(12934, 128) dtype=float32>, <tf.Variable 'decoder/gru_1/gru_cell_1/kernel:0' shape=(256, 384) dtype=float32>, <tf.Variable 'decoder/gru_1/gru_cell_1/recurrent_kernel:0' shape=(128, 384) dtype=float32>, <tf.Variable 'decoder/gru_1/gru_cell_1/bias:0' shape=(2, 384) dtype=float32>, <tf.Variable 'decoder/dense/kernel:0' shape=(128, 12934) dtype=float32>, <tf.Variable 'decoder/dense/bias:0' shape=(12934,) dtype=float32>)

几个问题：

为什么会有此警告when calling GradientTape.gradient, got tf.string？以及如何调试？如上所示，这些受监视的变量都没有dtype tf.string！
为什么会有TypeError: Cannot convert 0.0 to EagerTensor of dtype string？ tensorflow为什么将0.0 float转换为tf.string？

详细的追溯信息附在这里：

WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.string
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.string
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.string
2020-08-18 14:43:43.171101: W tensorflow/core/framework/op_kernel.cc:1755] Invalid argument: TypeError: Cannot convert 0.0 to EagerTensor of dtype string
Traceback (most recent call last):

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 242, in __call__
    return func(device, token, args)

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 140, in __call__
    outputs = [

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 141, in <listcomp>
    _maybe_copy_to_context_device(self._convert(x, dtype=dtype),

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 119, in _convert
    return constant_op.constant(0.0, dtype=dtype)

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 263, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 275, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)

TypeError: Cannot convert 0.0 to EagerTensor of dtype string


Traceback (most recent call last):
  File "main.py", line 31, in <module>
    run_rl(args)
  File "/home/user/textgen/nmt_spanish_english/run_rl.py", line 392, in run_rl
    train_rl(input_tensor_train, target_tensor_train, input_tensor_valid, target_tensor_valid, targ_lang, encoder, decoder, args.rl_log_dir)
  File "/home/user/textgen/nmt_spanish_english/run_rl.py", line 270, in train_rl
    batch_loss = train_step_rl(inp, targ, targ_sents, targ_lang, table, enc_hidden, optimizer, encoder, decoder, train_loss)
  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
    outputs = execute.execute(
  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  TypeError: Cannot convert 0.0 to EagerTensor of dtype string
Traceback (most recent call last):

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 242, in __call__
    return func(device, token, args)

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 140, in __call__
    outputs = [

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 141, in <listcomp>
    _maybe_copy_to_context_device(self._convert(x, dtype=dtype),

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/script_ops.py", line 119, in _convert
    return constant_op.constant(0.0, dtype=dtype)

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 263, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 275, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)

  File "/home/user/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 98, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)

TypeError: Cannot convert 0.0 to EagerTensor of dtype string

TypeError：无法将0.0转换为dtype字符串的EagerTensor

0 个答案: