我正在训练一组相当复杂的模型,我正在寻找一种方法来保存和加载模型优化器状态。 "培训师模型"由几个其他"重量模型"的不同组合组成,其中一些具有共享权重,一些具有取决于训练者的冻结权重等。分享的例子有点过于复杂,但简而言之,在停止和开始训练时,我无法使用model.save('model_file.h5')
和keras.models.load_model('model_file.h5')
。
如果培训结束,使用model.load_weights('weight_file.h5')
可以很好地测试我的模型,但如果我尝试使用这种方法继续训练模型,那么损失甚至不会回到最后一个位置。我已经读过这是因为没有使用这种方法保存优化器状态是有意义的。但是,我需要一种方法来保存和加载我的教练模型的优化器的状态。似乎keras曾经有model.optimizer.get_sate()
和model.optimizer.set_sate()
来完成我所追求的目标,但现在似乎不再是这种情况(至少对于Adam优化器而言)。目前的Keras还有其他解决方案吗?
答案 0 :(得分:10)
您可以从load_model
和save_model
函数中提取重要的行。
在save_model
:
# Save optimizer weights.
symbolic_weights = getattr(model.optimizer, 'weights')
if symbolic_weights:
optimizer_weights_group = f.create_group('optimizer_weights')
weight_values = K.batch_get_value(symbolic_weights)
要加载优化程序状态,请在load_model
:
# Set optimizer weights.
if 'optimizer_weights' in f:
# Build train function (to get weight updates).
if isinstance(model, Sequential):
model.model._make_train_function()
else:
model._make_train_function()
# ...
try:
model.optimizer.set_weights(optimizer_weight_values)
结合上面的几行,这是一个例子:
X, y = np.random.rand(100, 50), np.random.randint(2, size=100)
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X, y, epochs=5)
Epoch 1/5
100/100 [==============================] - 0s 4ms/step - loss: 0.7716
Epoch 2/5
100/100 [==============================] - 0s 64us/step - loss: 0.7678
Epoch 3/5
100/100 [==============================] - 0s 82us/step - loss: 0.7665
Epoch 4/5
100/100 [==============================] - 0s 56us/step - loss: 0.7647
Epoch 5/5
100/100 [==============================] - 0s 76us/step - loss: 0.7638
model.save_weights('weights.h5')
symbolic_weights = getattr(model.optimizer, 'weights')
weight_values = K.batch_get_value(symbolic_weights)
with open('optimizer.pkl', 'wb') as f:
pickle.dump(weight_values, f)
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.load_weights('weights.h5')
model._make_train_function()
with open('optimizer.pkl', 'rb') as f:
weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)
model.fit(X, y, epochs=5)
Epoch 1/5
100/100 [==============================] - 0s 674us/step - loss: 0.7629
Epoch 2/5
100/100 [==============================] - 0s 49us/step - loss: 0.7617
Epoch 3/5
100/100 [==============================] - 0s 49us/step - loss: 0.7611
Epoch 4/5
100/100 [==============================] - 0s 55us/step - loss: 0.7601
Epoch 5/5
100/100 [==============================] - 0s 49us/step - loss: 0.7594
答案 1 :(得分:6)
对于那些不使用model.compile
而是执行自动微分以使用optimizer.apply_gradients
手动应用渐变的用户,我认为我有解决方案。
首先,保存优化器权重:np.save(path, optimizer.get_weights())
然后,当您准备重新加载优化器时,通过在要计算其梯度的变量大小的张量列表上调用optimizer.apply_gradients
,显示新实例化的优化器将更新的权重大小。 。在设置优化器的权重之后,设置模型的权重非常重要,因为基于动量的优化器(例如Adam)将更新模型的权重,即使我们将梯度设为零也是如此。
import tensorflow as tf
import numpy as np
model = # instantiate model (functional or subclass of tf.keras.Model)
# Get saved weights
opt_weights = np.load('/path/to/saved/opt/weights.npy', allow_pickle=True)
grad_vars = model.trainable_weights
# This need not be model.trainable_weights; it must be a correctly-ordered list of
# grad_vars corresponding to how you usually call the optimizer.
optimizer = tf.keras.optimizers.Adam(lrate)
zero_grads = [tf.zeros_like(w) for w in grad_vars]
# Apply gradients which don't do nothing with Adam
optimizer.apply_gradients(zip(zero_grads, grad_vars))
# Set the weights of the optimizer
optimizer.set_weights(opt_weights)
# NOW set the trainable weights of the model
model_weights = np.load('/path/to/saved/model/weights.npy', allow_pickle=True)
model.set_weights(model_weights)
请注意,如果我们在第一次调用apply_gradients
之前尝试设置权重,则会引发错误,表明优化程序期望权重列表的长度为零。
答案 2 :(得分:1)
将Keras升级到2.2.4并使用pickle为我解决了这个问题。在keras版本2.2.3中,可以安全地腌制Keras模型。
答案 3 :(得分:1)
任何试图在分布式环境中使用@Yu-Yang 的 solution 的人可能会遇到以下错误:
ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.distribute_lib._DefaultDistributionStrategy object at 0x7fdf357726d8>), which is different from the scope used for the original variable (MirroredVariable:{
0: <tf.Variable 'conv2d_1/kernel:0' shape=(1, 1, 1, 1) dtype=float32, numpy=array([[[[-0.9592359]]]], dtype=float32)>
}). Make sure the slot variables are created under the same strategy scope. This may happen if you're restoring from a checkpoint outside the scope
或类似的。
要解决此问题,您只需使用以下命令在每个副本上运行模型的优化器权重设置:
import tensorflow as tf
strat = tf.distribute.MirroredStrategy()
with strat.scope():
model = tf.keras.models.Sequential([tf.keras.layers.Conv2D(1, 1, padding='same')])
model.compile(optimizer='adam', loss='mse')
model(tf.random.normal([1, 16, 16, 1]))
model.load_weights('model_weights.hdf5')
def model_weight_setting():
grad_vars = model.trainable_weights
zero_grads = [tf.zeros_like(w) for w in grad_vars]
model.optimizer.apply_gradients(zip(zero_grads, grad_vars))
with open('optimizer.pkl', 'rb') as f:
weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)
strat.run(model_weight_setting)
出于某种原因,这不是设置模型权重所必需的,但请确保您创建(通过此处的调用)并在策略范围内加载模型的权重,否则您可能会遇到错误ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.collective_all_reduce_strategy.CollectiveAllReduceStrategy object at 0x14ffdce82c50>), which is different from the scope used for the original variable
。
如果您想要完整的示例,我创建了 a colab showcasing this solution。
答案 4 :(得分:0)
完成Alex Trevithick的答案后,只需在应用梯度之前保存变量的状态然后重新加载,就可以避免重新调用model.set_weights
。从h5文件加载模型并看起来更简洁(imo)时,这很有用。
保存/加载功能如下(再次感谢Alex):
def save_optimizer_state(optimizer, save_path, save_name):
'''
Save keras.optimizers object state.
Arguments:
optimizer --- Optimizer object.
save_path --- Path to save location.
save_name --- Name of the .npy file to be created.
'''
# Create folder if it does not exists
if not os.path.exists(save_path):
os.makedirs(save_path)
# save weights
np.save(os.path.join(save_path, save_name), optimizer.get_weights())
return
def load_optimizer_state(optimizer, load_path, load_name, model_train_vars):
'''
Loads keras.optimizers object state.
Arguments:
optimizer --- Optimizer object to be loaded.
load_path --- Path to save location.
load_name --- Name of the .npy file to be read.
model_train_vars --- List of model variables (obtained using Model.trainable_variables)
'''
# Load optimizer weights
opt_weights = np.load(os.path.join(load_path, load_name)+'.npy', allow_pickle=True)
# dummy zero gradients
zero_grads = [tf.zeros_like(w) for w in model_train_vars]
# save current state of variables
saved_vars = [tf.identity(w) for w in model_train_vars]
# Apply gradients which don't do nothing with Adam
optimizer.apply_gradients(zip(zero_grads, model_train_vars))
# Reload variables
[x.assign(y) for x,y in zip(model_train_vars, saved_vars)]
# Set the weights of the optimizer
optimizer.set_weights(opt_weights)
return
答案 5 :(得分:0)
以下代码适用于我(Tensorflow 2.5)。
我使用通用句子编码器作为模型,以及 Adam 优化器。
基本上我所做的是:我使用一个虚拟输入来正确设置优化器。
然后我设置了权重。
保存优化器的权重
array:3 [
0 => "15"
1 => "17"
2 => "19"
]
加载优化器
np.save(f'{path}/optimizer.npy', optimizer.get_weights())