Question

我正在编写遵循论文规则https://arxiv.org/pdf/1604.02677.pdf

的solver.prototxt

在训练阶段，学习率初始设定为0.001，当损失停止减少至10-7时，学习率降低10倍。折扣权重最初设为1，每万次迭代减少10倍，直至边际值10-3。

请注意，折扣权重是Caffe中的loss_weight。根据以上信息，我写了我的求解器

train_net: "train.prototxt"
lr_policy: "step"
gamma: 0.1
stepsize: 10000
base_lr: 0.001 #0.002

在train.prototxt中，我也设置了

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "deconv"
  bottom: "label"
  top: "loss"
  loss_weight: 1
}

然而，我仍然不知道如何设置求解器以满足规则＆＃34;当损失减少到10-7＆＃34时，减少了10倍; 并且＆＃34;每万次迭代减少10倍，直到边际值10-3＆＃34; 。我没有发现任何caffe规则可以作为参考：

// The learning rate decay policy. The currently implemented learning rate
// policies are as follows:
//    - fixed: always return base_lr.
//    - step: return base_lr * gamma ^ (floor(iter / step))
//    - exp: return base_lr * gamma ^ iter
//    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
//    - multistep: similar to step but it allows non uniform steps defined by
//      stepvalue
//    - poly: the effective learning rate follows a polynomial decay, to be
//      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
//    - sigmoid: the effective learning rate follows a sigmod decay
//      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
//
// where base_lr, max_iter, gamma, step, stepvalue and power are defined
// in the solver parameter protocol buffer, and iter is the current iteration.

如果有人知道，请给我一些编写solver.prototxt的指南以满足上述条件。

Answer 1

学习率降低

部分问题是短语decreased by a factor of 10 when the loss stopped decreasing till 10e−7没有多大意义。我想也许，作者试图说每次损失减少时，他们都会将学习率降低10倍，直到学习率达到10e-7。

如果是这样，那么这是一个手动过程，而不是您可以选择Caffe参数的过程。最重要的是，＆＃34;当损失停止减少时＃34;这是一个非平凡的判断，虽然长基线移动平均线会给你一个很好的指示。我希望作者手动完成此操作，从检查点停止并重新开始训练。

你可以获得与step的学习率衰减政策类似的效果：将gamma设置为0.1，并将步长参数设置得足够高，以确保在每次训练之前训练已经趋于平稳降低利率。这会浪费一些计算机时间，但可能会为您节省整体麻烦。

折扣重量

在Caffe中，损失权重仅仅是模型中各种损失之间的相对权重，用于实现最终损失统计的线性因子。 Caffe不提供重量的运行时改变。也许这是作者手工调整的其他内容。

我尝试阅读本文的两个区域，并提到了＃34;折扣重量＆＃34;，但发现它难以阅读。我会等到有人校对并编辑该论文的语法和清晰度。与此同时，我希望这个答案对你有帮助。

您可以找到更多信息here。

如何写一个solver.prototxt满足CAFFE中的给定条件？

1 个答案: