Question

由于数据集较大，我正在迭代地（在for循环中）应用 keras model fitting 。我的目标是将数据集拆分为100个部分，一次读取每个部分并应用 fit() 方法。

我的问题：在每次迭代中，fit()方法是否从我在模型编译期间设置的初始学习率（lr = 0.1）开始？或者它会记住上次更新的学习率并直接应用于 fit() 方法的新调用。

我的代码示例如下：

# Define model
my_model()

# Set the optimizer
sgd = SGD(lr=0.1, decay=1e-08, momentum=0.9, nesterov=False)

# Compile model
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

# Fit model and train
for j in range(100):
        print('Data extracting from big matrix ...')
        X_train = HDF5Matrix(path_train, 'X', start=st, end=ed)
        Y_train = HDF5Matrix(path_train, 'y', start=st, end=ed) 

        print('Fitting model ...')
        model.fit(X_train, Y_train, batch_size=100, shuffle='batch', nb_epoch=1,
              validation_data=(X_test, Y_test))

Answer 1

更新的学习率会记录在优化器对象model.optimizer中，这只是示例中的sgd变量。

在LearningRateScheduler等回调中，学习率变量model.optimizer.lr会更新（为清晰起见，会删除某些行）。

def on_epoch_begin(self, epoch, logs=None):
    lr = self.schedule(epoch)
    K.set_value(self.model.optimizer.lr, lr)

但是，当使用decay时（如您的示例所示），学习率变量不会直接更新，但会更新变量model.optimizer.iterations。此变量记录在模型拟合中使用了多少批次，并且在SGD.get_updates()中通过以下方式计算衰减的学习率：

lr = self.lr
if self.initial_decay > 0:
    lr *= (1. / (1. + self.decay * K.cast(self.iterations,
                                          K.dtype(self.decay))))

因此，在任何一种情况下，只要不重新编译模型，它就会在新的fit()调用中使用更新的学习速率。

迭代应用keras模型fit（）时的学习率状态是多少？

1 个答案: