我正在尝试训练fcn32。我正在为我自己的数据训练voc-fcn32s模型,这些数据具有不平衡的类号。这是18,000次迭代的学习曲线:
正如你所看到的,训练在某些方面正在逐渐减少,然后却在波动。我读了一些在线建议,他们建议降低学习率或改变填充层的卷积层中的偏差值。所以,我所做的是,我为这两个层更改了train_val.prototxt如下:
....
layer {
name: "score_fr"
type: "Convolution"
bottom: "fc7"
top: "score_fr"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 5 # the number of classes
pad: 0
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.5 #+
}
}
}
layer {
name: "upscore"
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 5 # the number of classes
bias_term: true #false
kernel_size: 64
stride: 32
group: 5 #2
weight_filler: {
type: "bilinear"
value:0.5 #+
}
}
}
....
似乎没有太多东西在模型的行为上发生了变化。
1)我是否正确地将这些值添加到weight_filler
?
2)我是否应该通过每次减少因子10将解算器中的学习策略从fixed
更改为step
?它有助于解决这个问题吗?
我担心我做错了什么,我的模型无法收敛。有没有人对此有任何建议?在训练模型时我应该考虑哪些重要的事情?我可以对solver
和train_val
模型进行什么样的更改?
我非常感谢你的帮助。
添加BatchNorm图层后的更多详细信息
感谢@Shai和@Jonathan建议添加batchNorm
图层。
我在Batch Normalization Layers
图层之前添加了reLU
,这是一个示例图层:
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 100
kernel_size: 3
stride: 1
}
}
layer {
name: "bn1_1"
type: "BatchNorm"
bottom: "conv1_1"
top: "bn1_1"
batch_norm_param {
use_global_stats: false
}
param {
lr_mult: 0
}
include {
phase: TRAIN
}
}
layer {
name: "bn1_1"
type: "BatchNorm"
bottom: "conv1_1"
top: "bn1_1"
batch_norm_param {
use_global_stats: true
}
param {
lr_mult: 0
}
include {
phase: TEST
}
}
layer {
name: "scale1_1"
type: "Scale"
bottom: "bn1_1"
top: "bn1_1"
scale_param {
bias_term: true
}
}
layer {
name: "relu1_1"
type: "ReLU"
bottom: "bn1_1"
top: "bn1_1"
}
layer {
name: "conv1_2"
type: "Convolution"
bottom: "bn1_1"
top: "conv1_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
}
}
据我所知,我只能在批量标准化中添加一个参数,而不是三个,因为我有单通道图像。这是我的理解是真的吗?如下:
param {
lr_mult: 0
}
我应该在文档中提到更多参数来扩展图层吗? Scale
层中这些参数的含义是什么?像:
layer { bottom: 'layerx-bn' top: 'layerx-bn' name: 'layerx-bn-scale' type: 'Scale',
scale_param {
bias_term: true
axis: 1 # scale separately for each channel
num_axes: 1 # ... but not spatially (default)
filler { type: 'constant' value: 1 } # initialize scaling to 1
bias_filler { type: 'constant' value: 0.001 } # initialize bias
}}
这是网络的。我不确定我错了多少/对。我添加正确吗?
另一个问题是关于debug_info。激活debug_info
后这些日志文件行的含义是什么? diff
和data
是什么意思?为什么值为0?我的网络工作正常吗?
I0123 23:17:49.498327 15230 solver.cpp:228] Iteration 50, loss = 105465
I0123 23:17:49.498337 15230 solver.cpp:244] Train net output #0: accuracy = 0.643982
I0123 23:17:49.498349 15230 solver.cpp:244] Train net output #1: loss = 105446 (* 1 = 105446 loss)
I0123 23:17:49.498359 15230 sgd_solver.cpp:106] Iteration 50, lr = 1e-11
I0123 23:19:12.680325 15230 net.cpp:608] [Forward] Layer data, top blob data data: 34.8386
I0123 23:19:12.680615 15230 net.cpp:608] [Forward] Layer data_data_0_split, top blob data_data_0_split_0 data: 34.8386
I0123 23:19:12.680670 15230 net.cpp:608] [Forward] Layer data_data_0_split, top blob data_data_0_split_1 data: 34.8386
I0123 23:19:12.680778 15230 net.cpp:608] [Forward] Layer label, top blob label data: 0
I0123 23:19:12.680829 15230 net.cpp:608] [Forward] Layer label_label_0_split, top blob label_label_0_split_0 data: 0
I0123 23:19:12.680896 15230 net.cpp:608] [Forward] Layer label_label_0_split, top blob label_label_0_split_1 data: 0
I0123 23:19:12.688591 15230 net.cpp:608] [Forward] Layer conv1_1, top blob conv1_1 data: 0
I0123 23:19:12.688695 15230 net.cpp:620] [Forward] Layer conv1_1, param blob 0 data: 0
I0123 23:19:12.688742 15230 net.cpp:620] [Forward] Layer conv1_1, param blob 1 data: 0
I0123 23:19:12.721791 15230 net.cpp:608] [Forward] Layer bn1_1, top blob bn1_1 data: 0
I0123 23:19:12.721853 15230 net.cpp:620] [Forward] Layer bn1_1, param blob 0 data: 0
I0123 23:19:12.721890 15230 net.cpp:620] [Forward] Layer bn1_1, param blob 1 data: 0
I0123 23:19:12.721901 15230 net.cpp:620] [Forward] Layer bn1_1, param blob 2 data: 96.1127
I0123 23:19:12.996196 15230 net.cpp:620] [Forward] Layer scale4_1, param blob 0 data: 1
I0123 23:19:12.996237 15230 net.cpp:620] [Forward] Layer scale4_1, param blob 1 data: 0
I0123 23:19:12.996939 15230 net.cpp:608] [Forward] Layer relu4_1, top blob bn4_1 data: 0
I0123 23:19:13.012020 15230 net.cpp:608] [Forward] Layer conv4_2, top blob conv4_2 data: 0
I0123 23:19:13.012403 15230 net.cpp:620] [Forward] Layer conv4_2, param blob 0 data: 0
I0123 23:19:13.012446 15230 net.cpp:620] [Forward] Layer conv4_2, param blob 1 data: 0
I0123 23:19:13.015959 15230 net.cpp:608] [Forward] Layer bn4_2, top blob bn4_2 data: 0
I0123 23:19:13.016005 15230 net.cpp:620] [Forward] Layer bn4_2, param blob 0 data: 0
I0123 23:19:13.016046 15230 net.cpp:620] [Forward] Layer bn4_2, param blob 1 data: 0
I0123 23:19:13.016054 15230 net.cpp:620] [Forward] Layer bn4_2, param blob 2 data: 96.1127
I0123 23:19:13.017211 15230 net.cpp:608] [Forward] Layer scale4_2, top blob bn4_2 data: 0
I0123 23:19:13.017251 15230 net.cpp:620] [Forward] Layer scale4_2, param blob 0 data: 1
I0123 23:19:13.017292 15230 net.cpp:620] [Forward] Layer scale4_2, param blob 1 data: 0
I0123 23:19:13.017980 15230 net.cpp:608] [Forward] Layer relu4_2, top blob bn4_2 data: 0
I0123 23:19:13.032080 15230 net.cpp:608] [Forward] Layer conv4_3, top blob conv4_3 data: 0
I0123 23:19:13.032452 15230 net.cpp:620] [Forward] Layer conv4_3, param blob 0 data: 0
I0123 23:19:13.032493 15230 net.cpp:620] [Forward] Layer conv4_3, param blob 1 data: 0
I0123 23:19:13.036018 15230 net.cpp:608] [Forward] Layer bn4_3, top blob bn4_3 data: 0
I0123 23:19:13.036064 15230 net.cpp:620] [Forward] Layer bn4_3, param blob 0 data: 0
I0123 23:19:13.036105 15230 net.cpp:620] [Forward] Layer bn4_3, param blob 1 data: 0
I0123 23:19:13.036114 15230 net.cpp:620] [Forward] Layer bn4_3, param blob 2 data: 96.1127
I0123 23:19:13.038148 15230 net.cpp:608] [Forward] Layer scale4_3, top blob bn4_3 data: 0
I0123 23:19:13.038189 15230 net.cpp:620] [Forward] Layer scale4_3, param blob 0 data: 1
I0123 23:19:13.038230 15230 net.cpp:620] [Forward] Layer scale4_3, param blob 1 data: 0
I0123 23:19:13.038969 15230 net.cpp:608] [Forward] Layer relu4_3, top blob bn4_3 data: 0
I0123 23:19:13.039417 15230 net.cpp:608] [Forward] Layer pool4, top blob pool4 data: 0
I0123 23:19:13.043354 15230 net.cpp:608] [Forward] Layer conv5_1, top blob conv5_1 data: 0
I0123 23:19:13.128515 15230 net.cpp:608] [Forward] Layer score_fr, top blob score_fr data: 0.000975524
I0123 23:19:13.128569 15230 net.cpp:620] [Forward] Layer score_fr, param blob 0 data: 0.0135222
I0123 23:19:13.128607 15230 net.cpp:620] [Forward] Layer score_fr, param blob 1 data: 0.000975524
I0123 23:19:13.129696 15230 net.cpp:608] [Forward] Layer upscore, top blob upscore data: 0.000790174
I0123 23:19:13.129734 15230 net.cpp:620] [Forward] Layer upscore, param blob 0 data: 0.25
I0123 23:19:13.130656 15230 net.cpp:608] [Forward] Layer score, top blob score data: 0.000955503
I0123 23:19:13.130709 15230 net.cpp:608] [Forward] Layer score_score_0_split, top blob score_score_0_split_0 data: 0.000955503
I0123 23:19:13.130754 15230 net.cpp:608] [Forward] Layer score_score_0_split, top blob score_score_0_split_1 data: 0.000955503
I0123 23:19:13.146767 15230 net.cpp:608] [Forward] Layer accuracy, top blob accuracy data: 1
I0123 23:19:13.148967 15230 net.cpp:608] [Forward] Layer loss, top blob loss data: 105320
I0123 23:19:13.149173 15230 net.cpp:636] [Backward] Layer loss, bottom blob score_score_0_split_1 diff: 0.319809
I0123 23:19:13.149323 15230 net.cpp:636] [Backward] Layer score_score_0_split, bottom blob score diff: 0.319809
I0123 23:19:13.150310 15230 net.cpp:636] [Backward] Layer score, bottom blob upscore diff: 0.204677
I0123 23:19:13.152452 15230 net.cpp:636] [Backward] Layer upscore, bottom blob score_fr diff: 253.442
I0123 23:19:13.153218 15230 net.cpp:636] [Backward] Layer score_fr, bottom blob bn7 diff: 9.20469
I0123 23:19:13.153254 15230 net.cpp:647] [Backward] Layer score_fr, param blob 0 diff: 0
I0123 23:19:13.153291 15230 net.cpp:647] [Backward] Layer score_fr, param blob 1 diff: 20528.8
I0123 23:19:13.153420 15230 net.cpp:636] [Backward] Layer drop7, bottom blob bn7 diff: 9.21666
I0123 23:19:13.153554 15230 net.cpp:636] [Backward] Layer relu7, bottom blob bn7 diff: 0
I0123 23:19:13.153856 15230 net.cpp:636] [Backward] Layer scale7, bottom blob bn7 diff: 0
E0123 23:19:14.382714 15230 net.cpp:736] [Backward] All net params (data, diff): L1 norm = (19254.6, 102644); L2 norm = (391.485, 57379.6)
如果有人知道,我真的很感激,请在这里分享想法/链接/资源。再次感谢
答案 0 :(得分:2)
我不希望改变偏见值来帮助培训。我要尝试的第一件事是降低学习率。您可以通过重新训练已达到平台的权重并使用较低base_lr的求解器来手动执行此操作。或者,您可以更改solver.prototxt以使用其他更新策略。您可以将方法设置为步骤,也可以使用更新策略(如Adam)。参见:
作为http://caffe.berkeleyvision.org/tutorial/solver.html,添加push
图层会有所帮助。批量标准化类似于"白化" /标准化输入数据,但应用于中间层。关于批量标准化的论文在@Shai recommends上。
您还应保留一些数据以进行验证。只看培训损失可能会产生误导。
答案 1 :(得分:0)
关于"BatchNorm"
参数:
该层有三个内部参数:(0)平均值,(1)方差,(2)移动平均因子,无论的通道数量或blob的形状。因此,如果您希望明确设置lr_mult
,则需要为所有三个定义它。
关于日志中的零:
请read this post了解如何阅读caffe的调试日志
看起来你是从头开始训练模型(不是微调),并且所有权重都设置为零。这是一个非常差的初始化策略
请考虑定义filler
和bias_filler
来初始化权重。