Question

作为DL和Tensorflow的新手，我试图参考https://machinetalk.org/2019/03/29/neural-machine-translation-with-attention-mechanism/?unapproved=67&moderation-hash=ea8e5dcb97c8236f68291788fbd746a7#comment-67来实现“序列到序列模型”。但是，由于我遵循的代码中未使用约定model.fit()，因此我对于如何实现尽早停止以避免过拟合感到非常困惑。

if MODE == 'train':
    for e in range(NUM_EPOCHS):
        en_initial_states = encoder.init_states(BATCH_SIZE)
        encoder.save_weights(
            'checkpoints_luong/encoder/encoder_{}.h5'.format(e + 1))
        decoder.save_weights(
            'checkpoints_luong/decoder/decoder_{}.h5'.format(e + 1))
        for batch, (source_seq, target_seq_in, target_seq_out) in enumerate(dataset.take(-1)):
            loss = train_step(source_seq, target_seq_in,
                              target_seq_out, en_initial_states)

            if batch % 100 == 0:
                print('Epoch {} Batch {} Loss {:.4f}'.format(
                    e + 1, batch, loss.numpy()))

        try:
            predict()

        except Exception:
            continue

以下是train_step()函数的语法：-

@tf.function
def train_step(source_seq, target_seq_in, target_seq_out, en_initial_states):
    loss = 0
    with tf.GradientTape() as tape:
        en_outputs = encoder(source_seq, en_initial_states)
        en_states = en_outputs[1:]
        de_state_h, de_state_c = en_states

        # We need to create a loop to iterate through the target sequences
        for i in range(target_seq_out.shape[1]):
            # Input to the decoder must have shape of (batch_size, length)
            # so we need to expand one dimension
            decoder_in = tf.expand_dims(target_seq_in[:, i], 1)
            logit, de_state_h, de_state_c, _ = decoder(
                decoder_in, (de_state_h, de_state_c), en_outputs[0])

            # The loss is now accumulated through the whole batch
            loss += loss_func(target_seq_out[:, i], logit)

    variables = encoder.trainable_variables + decoder.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))

    return loss / target_seq_out.shape[1]

要在这种情况下实施尽早停止，使用callback等的过程是否相同，或者该过程是否不同？高度赞赏这种情况的示例。

如何使用抢先停止监控Tensorflow中的损失？

0 个答案: