我正在使用以下带有tf.function装饰的训练步骤:
@tf.function
def train_step(inputs, labels):
with tf.GradientTape(persistent=True) as tape:
predictions = model([X, F], training=True)
losses = [l_f(tf.expand_dims(labels[:,i], axis=-1), predictions[i]) for i, l_f in enumerate(loss_functions)]
gradients = [tape.gradient(l, model.trainable_variables) for l in losses]
for g in gradients:
grads = [gg if gg is not None else tf.zeros_like(model.trainable_variables[i], dtype=tf.float32) for i, gg in enumerate(g)]
optimizer.apply_gradients(zip(grads, model.trainable_variables)
del tape
return losses
def weighted_loss(weights):
@tf.function
def loss_func(labels, predictions):
min_class_filter = tfk.backend.greater(labels, 0.5)
y_min = tf.boolean_mask(labels, min_class_filter)
y_max = tf.boolean_mask(labels, tf.math.logical_not(min_class_filter))
y_pred_min = tf.boolean_mask(predictions, min_class_filter)
y_pred_max = tf.boolean_mask(predictions, tf.math.logical_not(min_class_filter))
loss_min_class = tfk.backend.mean(tfk.backend.binary_crossentropy(y_min, y_pred_min))
loss_max_class = tfk.backend.mean(tfk.backend.binary_crossentropy(y_max, y_pred_max))
loss_all = tfk.backend.mean(tfk.backend.binary_crossentropy(labels, predictions))
return weights[0]*loss_min_class + weights[1]*loss_max_class + weights[2]*loss_all
return loss_func
loss_functions = [weighted_loss(w) for w in target_weights]
这有点古怪,但是基本上,我的网络有多个输出,这意味着在某些情况下,对于某些权重返回None的梯度是正确的,因此我将这些梯度替换为零,并且正在计算这些输出中的每一个分别损失,然后在每个步骤中传播它们。
以书面形式运行此代码时,要花费极长的时间(超过10分钟)来运行单个训练步骤,并且在日志中看到以下消息:
E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] function_operator failed: Invalid argument: Input 0 of node model/LSTM_forward_0/zeros_like was passed int32 from model/LSTM_forward_0/StatefulPartitioned Call:9 incompatible with expected variant.
当我删除@ tf.function装饰器时,它会在大约10%的时间内运行,并且看不到此日志警告。这是否是一条红色警告,还是正当地指向通过添加@ tf.function所创建的问题?
其他详细信息:
答案 0 :(得分:0)
根据我的阅读,tf.function
不应包括对图var的任何分配,以使其平稳运行。
在训练步骤中,您正在更改模型的权重,从而违反了此规定。
我不确定这是原因,但是您可以尝试仅将tf.function
留在损失函数中,而不留在训练步骤中。
答案 1 :(得分:0)
我已经找到了解决方法。问题在于覆盖无梯度,而不是覆盖永久性的梯度磁带。
@tf.function
def train_step(inputs, labels):
with tf.GradientTape(persistent=True) as tape:
predictions = model([X, F], training=True)
losses = [l_f(labels, predictions, i) for i, l_f in enumerate(loss_functions)]
gradients = [tape.gradient(l, model.trainable_variables) for l in losses]
for g in gradients:
optimizer.apply_gradients(zip(g, model.trainable_variables)
del tape
return losses
def weighted_loss(weights):
@tf.function
def loss_func(labs, preds, i):
labels = tf.expand_dims(labs[:,i], axis=-1)
predictions = preds[i]
min_class_filter = tfk.backend.greater(labels, 0.5)
y_min = tf.boolean_mask(labels, min_class_filter)
y_max = tf.boolean_mask(labels, tf.math.logical_not(min_class_filter))
y_pred_min = tf.boolean_mask(predictions, min_class_filter)
y_pred_max = tf.boolean_mask(predictions, tf.math.logical_not(min_class_filter))
loss_min_class = tfk.backend.mean(tfk.backend.binary_crossentropy(y_min, y_pred_min))
loss_max_class = tfk.backend.mean(tfk.backend.binary_crossentropy(y_max, y_pred_max))
loss_all = tfk.backend.mean(tfk.backend.binary_crossentropy(labels, predictions))
return weights[0]*loss_min_class + weights[1]*loss_max_class + weights[2]*loss_all
return loss_func
loss_functions = [weighted_loss(w) for w in target_weights]
通过将所有输出和所有标签传递到损失函数(即使我忽略了其中的一堆),磁带将为所有分支返回适当的梯度(0),而不仅仅是针对特定损失的焦点。 / p>