Question

我想保存模型并使用优化器状态加载它以进行再训练。我能够将模型权重保存为 .h5 文件，但优化器状态不佳。请帮帮我

Answer 1

如果您使用 model.save() 作为 'h5' 对象保存模型，它会存储优化器状态以及重新启动训练过程所需的所有其他信息。
代码：

import numpy as np
import tensorflow as tf
x = np.random.uniform(0,1, (1000,32))
y = np.random.randint(0,2, (1000,))
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(256, activation = 'relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax'))

model.compile(loss="sparse_categorical_crossentropy",
              optimizer='adam',
              metrics=['accuracy'])
def scheduler(epoch, lr):
  if epoch < 1:
    return lr
  else:
    return lr * tf.math.exp(-0.1)

callback = tf.keras.callbacks.LearningRateScheduler(scheduler, verbose = 1)
_ = model.fit(x= x, y = y, epochs = 25, validation_split=0.2, verbose = 1, callbacks=[callback])

输出：

Epoch 1/25

Epoch 00001: LearningRateScheduler reducing learning rate to 0.0010000000474974513.
25/25 [==============================] - 1s 24ms/step - loss: 0.6894 - accuracy: 0.5690 - val_loss: 0.6883 - val_accuracy: 0.5300
Epoch 2/25

Epoch 00002: LearningRateScheduler reducing learning rate to tf.Tensor(0.00090483745, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6853 - accuracy: 0.5378 - val_loss: 0.6843 - val_accuracy: 0.5650
Epoch 3/25

Epoch 00003: LearningRateScheduler reducing learning rate to tf.Tensor(0.0008187308, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6798 - accuracy: 0.5327 - val_loss: 0.6917 - val_accuracy: 0.5350
Epoch 4/25

Epoch 00004: LearningRateScheduler reducing learning rate to tf.Tensor(0.0007408183, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6733 - accuracy: 0.5744 - val_loss: 0.6848 - val_accuracy: 0.5550
Epoch 5/25

Epoch 00005: LearningRateScheduler reducing learning rate to tf.Tensor(0.0006703201, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6679 - accuracy: 0.6259 - val_loss: 0.6847 - val_accuracy: 0.5450
Epoch 6/25

Epoch 00006: LearningRateScheduler reducing learning rate to tf.Tensor(0.00060653075, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6613 - accuracy: 0.6176 - val_loss: 0.6890 - val_accuracy: 0.5450
Epoch 7/25

Epoch 00007: LearningRateScheduler reducing learning rate to tf.Tensor(0.00054881175, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6660 - accuracy: 0.6037 - val_loss: 0.6862 - val_accuracy: 0.5600
Epoch 8/25

Epoch 00008: LearningRateScheduler reducing learning rate to tf.Tensor(0.0004965854, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6635 - accuracy: 0.6162 - val_loss: 0.6886 - val_accuracy: 0.5600
Epoch 9/25

Epoch 00009: LearningRateScheduler reducing learning rate to tf.Tensor(0.00044932903, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6637 - accuracy: 0.5869 - val_loss: 0.6865 - val_accuracy: 0.5550
Epoch 10/25

Epoch 00010: LearningRateScheduler reducing learning rate to tf.Tensor(0.0004065697, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6583 - accuracy: 0.6218 - val_loss: 0.6883 - val_accuracy: 0.5700
Epoch 11/25

Epoch 00011: LearningRateScheduler reducing learning rate to tf.Tensor(0.0003678795, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6573 - accuracy: 0.5991 - val_loss: 0.6871 - val_accuracy: 0.5600
Epoch 12/25

Epoch 00012: LearningRateScheduler reducing learning rate to tf.Tensor(0.00033287113, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6497 - accuracy: 0.6228 - val_loss: 0.6876 - val_accuracy: 0.5650
Epoch 13/25

Epoch 00013: LearningRateScheduler reducing learning rate to tf.Tensor(0.00030119426, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6425 - accuracy: 0.6586 - val_loss: 0.6877 - val_accuracy: 0.5500
Epoch 14/25

Epoch 00014: LearningRateScheduler reducing learning rate to tf.Tensor(0.00027253185, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6424 - accuracy: 0.6579 - val_loss: 0.6878 - val_accuracy: 0.5650
Epoch 15/25

Epoch 00015: LearningRateScheduler reducing learning rate to tf.Tensor(0.00024659702, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6517 - accuracy: 0.6442 - val_loss: 0.6875 - val_accuracy: 0.5750
Epoch 16/25

Epoch 00016: LearningRateScheduler reducing learning rate to tf.Tensor(0.0002231302, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6401 - accuracy: 0.6753 - val_loss: 0.6886 - val_accuracy: 0.5650
Epoch 17/25

Epoch 00017: LearningRateScheduler reducing learning rate to tf.Tensor(0.00020189656, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6388 - accuracy: 0.6553 - val_loss: 0.6879 - val_accuracy: 0.5650
Epoch 18/25

Epoch 00018: LearningRateScheduler reducing learning rate to tf.Tensor(0.00018268357, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6441 - accuracy: 0.6505 - val_loss: 0.6889 - val_accuracy: 0.5650
Epoch 19/25

Epoch 00019: LearningRateScheduler reducing learning rate to tf.Tensor(0.00016529893, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6427 - accuracy: 0.6533 - val_loss: 0.6880 - val_accuracy: 0.5650
Epoch 20/25

Epoch 00020: LearningRateScheduler reducing learning rate to tf.Tensor(0.00014956866, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6434 - accuracy: 0.6330 - val_loss: 0.6886 - val_accuracy: 0.5650
Epoch 21/25

Epoch 00021: LearningRateScheduler reducing learning rate to tf.Tensor(0.00013533531, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6279 - accuracy: 0.7061 - val_loss: 0.6880 - val_accuracy: 0.5600
Epoch 22/25

Epoch 00022: LearningRateScheduler reducing learning rate to tf.Tensor(0.00012245646, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6368 - accuracy: 0.6492 - val_loss: 0.6883 - val_accuracy: 0.5700
Epoch 23/25

Epoch 00023: LearningRateScheduler reducing learning rate to tf.Tensor(0.000110803194, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6385 - accuracy: 0.6558 - val_loss: 0.6886 - val_accuracy: 0.5650
Epoch 24/25

Epoch 00024: LearningRateScheduler reducing learning rate to tf.Tensor(0.000100258876, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6323 - accuracy: 0.6689 - val_loss: 0.6884 - val_accuracy: 0.5600
Epoch 25/25

Epoch 00025: LearningRateScheduler reducing learning rate to tf.Tensor(9.0717986e-05, shape=(), dtype=float32).
25/25 [==============================] - 0s 3ms/step - loss: 0.6387 - accuracy: 0.6513 - val_loss: 0.6880 - val_accuracy: 0.5700

保存和加载模型并检查加载模型的lr：

model.save('mymodel.h5')
model1 = tf.keras.models.load_model('/content/mymodel.h5')
model1.optimizer.learning_rate

输出：

<tf.Variable 'learning_rate:0' shape=() dtype=float32, numpy=9.0717986e-05>

如上所示，输出日志中 lr 的最终值与加载模型的 lr 值匹配。
在重新启动模型时，您唯一需要记住的是为 initial_epoch 中的 model.fit() 参数提供一个值，因此所有依赖于 epoch 值进行计算的值（例如（上述情况下的 lr 调度程序））是计算正确。

保存和加载权重和优化器状态以进行再训练

1 个答案: