我注意到使用trainer.set_synchronization_file()的一个奇怪的行为。 据我所知(根据trainer_abstract.h和trainer.h中的实现),它保存了培训师的当前状态(包括网),以便我们可以在某个时刻停止培训并从完全相同的步骤重新开始。 但是在使用示例dnn_metric_learning_on_images_ex.cpp时(来自主站b4a54490783)。如果我停止训练(例如ctrl + c)然后重新启动,则损失会显着减少,就好像某个动量被重置一样,并且SGD正在寻找更好的改进途径。 任何人都有想法? 这是一些代码示例。同步后仅更新停止条件。
dnn_trainer<net_type> trainer(net, sgd(0.0001, 0.9), {1,0});
trainer.set_learning_rate(0.1);
trainer.be_verbose();
trainer.set_synchronization_file("face_metric_sync", std::chrono::minutes(5));
trainer.set_iterations_without_progress_threshold(10000);
// data loaders (...)
while(trainer.get_learning_rate() >= 1e-5)
{
qimages.dequeue(images);
qlabels.dequeue(labels);
trainer.train_one_step(images, labels);
}
以下是停止前的状态示例:
Saved state to face_metric_sync
step#: 198726 learning rate: 0.001 average loss: 0.00928623 steps without apparent progress: 5923
step#: 198844 learning rate: 0.001 average loss: 0.00950317 steps without apparent progress: 6183
step#: 198963 learning rate: 0.001 average loss: 0.00971744 steps without apparent progress: 6525
step#: 199082 learning rate: 0.001 average loss: 0.00917967 steps without apparent progress: 6681
step#: 199200 learning rate: 0.001 average loss: 0.00942927 steps without apparent progress: 6834
step#: 199319 learning rate: 0.001 average loss: 0.00938926 steps without apparent progress: 6941
step#: 199438 learning rate: 0.001 average loss: 0.00917057 steps without apparent progress: 6915
Saved state to face_metric_sync
step#: 199552 learning rate: 0.001 average loss: 0.00964872 steps without apparent progress: 7487
^C
重新启动后
objs.size(): 75656
step#: 199507 learning rate: 0.001 average loss: 0.00974885 steps without apparent progress: 7146
step#: 199654 learning rate: 0.001 average loss: 0.00720691 steps without apparent progress: 64
step#: 199812 learning rate: 0.001 average loss: 0.00687095 steps without apparent progress: 209
step#: 199970 learning rate: 0.001 average loss: 0.00705782 steps without apparent progress: 439
step#: 200128 learning rate: 0.001 average loss: 0.00690515 steps without apparent progress: 584