我试图分别运行fit_one_cycle
函数的不同时期;保存模型,加载模型并开始新的时代:
learn = language_model_learner(data, AWD_LSTM, drop_mult=0.5, pretrained=False).to_fp16()
learn.load('/content/gdrive/My Drive/Language Model/language_model')
learn.load_encoder('/content/gdrive/My Drive/Language Model/model_encoder');
lr = 1e-3
lr *= bs/48 # Scale learning rate by batch size
learn.unfreeze()
learn.fit_one_cycle(1, lr, moms=(0.8,0.7))
learn.save('/content/gdrive/My Drive/Language Model/language_model')
learn.save_encoder('/content/gdrive/My Drive/Language Model/model_encoder')
问题:在每个时期之后我应该如何更改learning rate
?
答案 0 :(得分:0)
您可以检查Discriminative Layer Training,它对模型的不同层使用了不同的学习率。
# creates 3 layer groups with start, middle and end groups
learn.split(lambda m: (m[0][6], m[1]))
# only randomly initialized head now trainable
learn.freeze()
注意:无需手动拆分图层fit_one_cycle
自动随机拆分。
# all layers now trainable
learn.unfreeze()
# optionally, separate LR and WD for each group for 5 epochs
learn.fit_one_cycle(5, max_lr=slice(1e-5,1e-3), wd=(1e-4,1e-4,1e-1))