有人能帮助我找到一种方法,用fit_generator恢复LSTM模型的训练,而又不将损失值重现为inf吗?
背景:我正在训练一个LSTM模型,并且我有一个非常大的时间序列数据(许多采样时间),并且只有2个功能。因此,我的时间序列x数据的形状为N×2,其中N是一个非常大的数字。我使用批处理生成器将我的数据随机分成2个小批量的batch_N(其中batch_N比N小得多):
def batch_generator(batch_size, sequence_length):
...
for i in range(batch_size):
...
x_batch[i] = batch_x_train_scaled[idx:idx+sequence_length]
y_batch[i] = batch_y_train_scaled[idx:idx+sequence_length]
yield (x_batch, y_batch)
我还使用ModelCheckpoint来保存训练有素的模型
callback_checkpoint = ModelCheckpoint(filepath=path_checkpoint,
monitor='val_loss', verbose=1,save_weights_only=False, save_best_only=True)
此外,每次我想恢复训练时,首先加载最后保存的模型:
if True:
try:
# model.load_weights(path_checkpoint)
model = load_model(path_checkpoint)
except Exception as error:
print("Error trying to load checkpoint.")
print(error)
问题出在哪里??每次我恢复训练时,都会首先将新的批处理文件加载到fit_generator中,并且模型将使用最后的保存权重。损失值重置为inf。因此,在第一个恢复的训练时期结束时,无论训练结果有多好,该模型都会报告val_loss从inf改善到一定数量,从而覆盖了新的权重。问题在于,有时新的权重不如以前的最佳(由于这次模型使用新的批次数据进行训练),因此我将失去一些最佳的权重。
到目前为止,我已完成哪些工作来解决此问题?
方法之一(不成功):定义自定义损失函数:
def my_loss(y_true, y_pred):
train_loss = binary_crossentropy(y_true, y_pred)
validation_loss = 2*binary_crossentropy(y_true, y_pred)
temp=tf.keras.backend.cast(validation_loss,'float16')
if temp>1: # update 1 to last best val_loss before resume training
validation_loss=validation_loss+np.inf
# validation_loss=np.inf
return tf.keras.backend.in_train_phase(train_loss, validation_loss)
model.compile(loss=my_loss, optimizer=optimizer)
方法一的结果:
Error:
---> 12 if temp>1:
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
处理两个(不成功):定义自定义回调以保存模型:
best_val_loss = 1 # update 1 to last best val_loss before resume training
def saveModel(epoch,logs):
val_loss = logs['val_loss']
if val_loss < best_val_loss:
best_val_loss=val_loss
model.save('my_model.hdf5')
my_callback = LambdaCallback(on_epoch_end=saveModel)
方法二的结果:
UnboundLocalError: local variable 'best_val_loss' referenced before assignment
方法三(失败):定义自定义回调以保存模型:
best_val_loss = 1 # update 1 to last best val_loss before resume training
def saveModel(epoch,logs,best_val_loss):
val_loss = logs['val_loss']
if val_loss < best_val_loss:
best_val_loss=val_loss
model.save('my_model.hdf5')
my_callback = LambdaCallback(on_epoch_end=saveModel)
方法三的结果:
TypeError: saveModel() missing 1 required positional argument: 'best_val_loss'