我有一个fit()函数,使用ModelCheckpoint()回调来保存模型,如果它比任何以前的模型更好,使用save_weights_only = False,这样就可以保存整个模型。这应该允许我以后使用load_model()恢复训练。
不幸的是,在save()/ load_model()往返的某处,未保留指标值 - 例如,val_loss设置为inf。这意味着当训练恢复时,在第一个纪元后,ModelCheckpoint()将始终保存模型,这几乎总是比之前的会话中的前一个冠军更差。
我已经确定我可以在恢复训练之前设置ModelCheckpoint()当前的最佳值,如下所示:
myCheckpoint = ModelCheckpoint(...)
myCheckpoint.best = bestValueSoFar
显然,我可以监控我需要的值,将它们写入文件,然后在我恢复时再次读取它们,但鉴于我是一个Keras新手,我想知道我是否遗漏了一些明显的东西。
答案 0 :(得分:4)
我最终快速编写了自己的回调,跟踪最佳训练值,以便我可以重新加载它们。它看起来像这样:
# State monitor callback. Tracks how well we are doing and writes
# some state to a json file. This lets us resume training seamlessly.
#
# ModelState.state is:
#
# { "epoch_count": nnnn,
# "best_values": { dictionary with keys for each log value },
# "best_epoch": { dictionary with keys for each log value }
# }
class ModelState(callbacks.Callback):
def __init__(self, state_path):
self.state_path = state_path
if os.path.isfile(state_path):
print('Loading existing .json state')
with open(state_path, 'r') as f:
self.state = json.load(f)
else:
self.state = { 'epoch_count': 0,
'best_values': {},
'best_epoch': {}
}
def on_train_begin(self, logs={}):
print('Training commences...')
def on_epoch_end(self, batch, logs={}):
# Currently, for everything we track, lower is better
for k in logs:
if k not in self.state['best_values'] or logs[k] < self.state['best_values'][k]:
self.state['best_values'][k] = float(logs[k])
self.state['best_epoch'][k] = self.state['epoch_count']
with open(self.state_path, 'w') as f:
json.dump(self.state, f, indent=4)
print('Completed epoch', self.state['epoch_count'])
self.state['epoch_count'] += 1
然后,在fit()函数中,像这样:
# Set up the model state, reading in prior results info if available
model_state = ModelState(path_to_state_file)
# Checkpoint the model if we get a better result
model_checkpoint = callbacks.ModelCheckpoint(path_to_model_file,
monitor='val_loss',
save_best_only=True,
verbose=1,
mode='min',
save_weights_only=False)
# If we have trained previously, set up the model checkpoint so it won't save
# until it finds something better. Otherwise, it would always save the results
# of the first epoch.
if 'best_values' in model_state.state:
model_checkpoint.best = model_state.state['best_values']['val_loss']
callback_list = [model_checkpoint,
model_state]
# Offset epoch counts if we are resuming training. If you don't do
# this, only epochs-initial_epochs epochs will be done.
initial_epoch = model_state.state['epoch_count']
epochs += initial_epoch
# .fit() or .fit_generator, etc. goes here.
答案 1 :(得分:2)
我不认为您必须自己存储指标值。在keras
项目中有一个feature-request关于非常类似的东西,但它已被关闭。也许你可以尝试使用那里提出的解决方案。在keras
的哲学中,存储指标并不是非常有用,因为您只是保存model
,这意味着:每层的架构和权重;不是历史或其他任何东西。
最简单的方法是创建一种metafile
,其中包含模型的度量值和模型本身的名称。然后,您可以加载metafile
,获取最佳度量值并获取产生它们的模型名称,再次加载模型,以恢复培训。