我正在尝试在自定义 TensorFlow 训练循环中实现 Sharpness Aware Minimization (SAM) 方法。该算法遵循以下步骤:
我的训练循环是:
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
optimizer = tf.keras.optimizers.Adam()
train_acc_metric = tf.keras.metrics.CategoricalAccuracy()
val_acc_metric = tf.keras.metrics.CategoricalAccuracy()
for epoch in range(epochs):
# Iterate over the batches of train dataset
for batch, (inputs, targets) in enumerate(train_ds):
with tf.GradientTape(persistent = True) as tape:
# Forward pass
predictions = model(inputs)
# Compute loss value
loss = loss_fn(targets, predictions)
# Update accuracy
train_acc_metric.update_state(targets, predictions)
# Gradient wrt model's weights
gradient = tape.gradient(loss, model.trainable_weights)
# USING EQ 2
numerator1 = list(map(lambda g: tf.math.pow(tf.math.abs(g),q-1), gradient))
numerator2 = list(map(lambda g: rho*tf.math.sign(g), gradient))
numerator = list(map(lambda n1, n2: n1*n2, numerator1,numerator2))
denominator = list(map(lambda g: tf.math.pow(tf.norm(g, ord=q),q), gradient))
epsilon = list(map(lambda n, d: n/d, numerator, denominator))
# Compute gradient at weights+epsilon
modified_weights = list(map(lambda e, w: w+e, epsilon, model.trainable_weights))
gradient = tape.gradient(loss, modified_weights)
# Update weights (ValueError:No gradients provided for any variable)
optimizer.apply_gradients(zip(gradient, model.trainable_weights))
检查 tape.gradient(loss, modified_weights)
中计算的梯度后,所有层的梯度均为 None。我无法弄清楚如何避免图表中的断开连接。
已经有人问过类似的问题 here,但没有任何答案。
方程2: