Question

我正在使用以下带有tf.function装饰的训练步骤：

@tf.function
def train_step(inputs, labels):
    with tf.GradientTape(persistent=True) as tape:
        predictions = model([X, F], training=True)
        losses = [l_f(tf.expand_dims(labels[:,i], axis=-1), predictions[i]) for i, l_f in enumerate(loss_functions)]
    gradients = [tape.gradient(l, model.trainable_variables) for l in losses]
    for g in gradients:
        grads = [gg if gg is not None else tf.zeros_like(model.trainable_variables[i], dtype=tf.float32) for i, gg in enumerate(g)]
        optimizer.apply_gradients(zip(grads, model.trainable_variables)
    del tape
    return losses


def weighted_loss(weights):
    @tf.function
    def loss_func(labels, predictions):
        min_class_filter = tfk.backend.greater(labels, 0.5)

        y_min = tf.boolean_mask(labels, min_class_filter)
        y_max = tf.boolean_mask(labels, tf.math.logical_not(min_class_filter))
        y_pred_min = tf.boolean_mask(predictions, min_class_filter)
        y_pred_max = tf.boolean_mask(predictions, tf.math.logical_not(min_class_filter))

        loss_min_class = tfk.backend.mean(tfk.backend.binary_crossentropy(y_min, y_pred_min))
        loss_max_class = tfk.backend.mean(tfk.backend.binary_crossentropy(y_max, y_pred_max))
        loss_all = tfk.backend.mean(tfk.backend.binary_crossentropy(labels, predictions))
        return weights[0]*loss_min_class + weights[1]*loss_max_class + weights[2]*loss_all
    return loss_func

loss_functions = [weighted_loss(w) for w in target_weights]

这有点古怪，但是基本上，我的网络有多个输出，这意味着在某些情况下，对于某些权重返回None的梯度是正确的，因此我将这些梯度替换为零，并且正在计算这些输出中的每一个分别损失，然后在每个步骤中传播它们。

以书面形式运行此代码时，要花费极长的时间（超过10分钟）来运行单个训练步骤，并且在日志中看到以下消息：

E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] function_operator failed: Invalid argument: Input 0 of node model/LSTM_forward_0/zeros_like was passed int32 from model/LSTM_forward_0/StatefulPartitioned Call:9 incompatible with expected variant.

当我删除@ tf.function装饰器时，它会在大约10％的时间内运行，并且看不到此日志警告。这是否是一条红色警告，还是正当地指向通过添加@ tf.function所创建的问题？

其他详细信息：

TF 2.0
已启用GPU并可用
CUDA 10.1
在两种情况下，GPU利用率均为0％，但这不是由于数据馈送最大程度地提高了CPU吞吐量，因为当我在训练循环之外生成训练数据时，它与TFRecords的即时性一样好，并且具有足够的预提取和有限的扩充
输入，标签，渐变和所有model.trainable_variables的类型均为tf.float32

Answer 1

根据我的阅读，tf.function不应包括对图var的任何分配，以使其平稳运行。

在训练步骤中，您正在更改模型的权重，从而违反了此规定。

我不确定这是原因，但是您可以尝试仅将tf.function留在损失函数中，而不留在训练步骤中。

Answer 2

我已经找到了解决方法。问题在于覆盖无梯度，而不是覆盖永久性的梯度磁带。

@tf.function
def train_step(inputs, labels):
    with tf.GradientTape(persistent=True) as tape:
        predictions = model([X, F], training=True)
        losses = [l_f(labels, predictions, i) for i, l_f in enumerate(loss_functions)]
    gradients = [tape.gradient(l, model.trainable_variables) for l in losses]
    for g in gradients:
        optimizer.apply_gradients(zip(g, model.trainable_variables)
    del tape
    return losses


def weighted_loss(weights):
    @tf.function
    def loss_func(labs, preds, i):
        labels = tf.expand_dims(labs[:,i], axis=-1)
        predictions = preds[i]
        min_class_filter = tfk.backend.greater(labels, 0.5)

        y_min = tf.boolean_mask(labels, min_class_filter)
        y_max = tf.boolean_mask(labels, tf.math.logical_not(min_class_filter))
        y_pred_min = tf.boolean_mask(predictions, min_class_filter)
        y_pred_max = tf.boolean_mask(predictions, tf.math.logical_not(min_class_filter))

        loss_min_class = tfk.backend.mean(tfk.backend.binary_crossentropy(y_min, y_pred_min))
        loss_max_class = tfk.backend.mean(tfk.backend.binary_crossentropy(y_max, y_pred_max))
        loss_all = tfk.backend.mean(tfk.backend.binary_crossentropy(labels, predictions))
        return weights[0]*loss_min_class + weights[1]*loss_max_class + weights[2]*loss_all
    return loss_func

loss_functions = [weighted_loss(w) for w in target_weights]

通过将所有输出和所有标签传递到损失函数（即使我忽略了其中的一堆），磁带将为所有分支返回适当的梯度（0），而不仅仅是针对特定损失的焦点。 / p>

@ tf.function正在减慢训练步骤

2 个答案: