我的caffe模型中有两个损失层。当我将loss_weight设置为零时,整个模型将停止训练。这是怎么回事?

时间:2017-07-29 00:57:38

标签: caffe

关于loss layers的Caffe文档说我可以将任何图层上的loss_weight参数设置为0.这将否定图层对损失函数的任何贡献。但是,当我输入两个不同的损耗层并将其中一个的loss_weight设置为零时,网络将完全停止训练,即使有两个丢失层可供使用。

我在MNIST数据集上用LeNet进行说明。对LeNet .prototxt进行了两处更改:

1)将loss_weight参数添加到SoftmaxWithLoss图层,将损失权重设置为0.5。 2)重复更改的SoftmaxWithLoss图层,创建两个相同的损失图层,两个损失权重均为0.5。

编辑:这是两层的Caffe输出,其中loss_weights为0.5:

I0729 01:06:58.008580  1336 solver.cpp:397]     Test net output #0: accuracy = 0.0024
I0729 01:06:58.008597  1336 solver.cpp:397]     Test net output #1: loss = 4.68486 (* 0.5 = 2.34243 loss)
I0729 01:06:58.008600  1336 solver.cpp:397]     Test net output #2: loss2 = 4.68486 (* 0.5 = 2.34243 loss)
I0729 01:06:58.011636  1336 solver.cpp:218] Iteration 0 (0 iter/s, 0.102775s/100 iters), loss = 4.72888
I0729 01:06:58.011657  1336 solver.cpp:237]     Train net output #0: loss = 4.72888 (* 0.5 = 2.36444 loss)
I0729 01:06:58.011662  1336 solver.cpp:237]     Train net output #1: loss2 = 4.72888 (* 0.5 = 2.36444 loss)
I0729 01:06:58.011669  1336 sgd_solver.cpp:105] Iteration 0, lr = 0.01
I0729 01:06:58.035707  1336 solver.cpp:330] Iteration 10, Testing net (#0)
I0729 01:06:58.035722  1336 net.cpp:676] Ignoring source layer data
I0729 01:06:58.035724  1336 net.cpp:676] Ignoring source layer label_data_1_split
I0729 01:06:58.128404  1343 data_layer.cpp:73] Restarting data prefetching from start.
I0729 01:06:58.131825  1336 solver.cpp:397]     Test net output #0: accuracy = 0.4744
I0729 01:06:58.131842  1336 solver.cpp:397]     Test net output #1: loss = 1.70187 (* 0.5 = 0.850937 loss)
I0729 01:06:58.131846  1336 solver.cpp:397]     Test net output #2: loss2 = 1.70187 (* 0.5 = 0.850937 loss)
I0729 01:06:58.155300  1336 solver.cpp:330] Iteration 20, Testing net (#0)
I0729 01:06:58.155344  1336 net.cpp:676] Ignoring source layer data
I0729 01:06:58.155352  1336 net.cpp:676] Ignoring source layer label_data_1_split
I0729 01:06:58.249866  1343 data_layer.cpp:73] Restarting data prefetching from start.
I0729 01:06:58.253347  1336 solver.cpp:397]     Test net output #0: accuracy = 0.5036
I0729 01:06:58.253365  1336 solver.cpp:397]     Test net output #1: loss = 1.47916 (* 0.5 = 0.739582 loss)
I0729 01:06:58.253368  1336 solver.cpp:397]     Test net output #2: loss2 = 1.47916 (* 0.5 = 0.739582 loss)

训练完全正常进行,性能与未修改的.prototxt相同;在前一百次左右的迭代中,准确度迅速上升到0.90,然后在.98左右达到平稳。

然后,当其中一个loss_weights设置为0而另一个设置为1时,模型完全停止训练,精度为0,损失恒定为87.3365。

这是caffe .prototxt中添加的图层的样子;其他一切都与基本的lenet .prototxt相同。

layer {
  name: "loss2"
  type: "SoftmaxWithLoss"
  bottom: "ip_s2"
  bottom: "label"
  top: "loss2"
  loss_weight: 0
}

以下是相关的caffe输出:

I0729 00:45:06.989990  1318 solver.cpp:218] Iteration 0 (-0.00385461 iter/s, 0.0972042s/100 iters), loss = 4.75995
I0729 00:45:06.990005  1318 solver.cpp:237]     Train net output #0: loss = 4.75995 (* 1 = 4.75995 loss)
I0729 00:45:06.990010  1318 solver.cpp:237]     Train net output #1: loss2 = 4.75995
I0729 00:45:06.990020  1318 sgd_solver.cpp:105] Iteration 0, lr = 0.01
I0729 00:45:07.013304  1318 solver.cpp:330] Iteration 10, Testing net (#0)
I0729 00:45:07.013320  1318 net.cpp:676] Ignoring source layer data
I0729 00:45:07.013322  1318 net.cpp:676] Ignoring source layer label_data_1_split
I0729 00:45:07.071669  1318 blocking_queue.cpp:49] Waiting for data
I0729 00:45:07.107491  1325 data_layer.cpp:73] Restarting data prefetching from start.
I0729 00:45:07.112396  1318 solver.cpp:397]     Test net output #0: accuracy = 0
I0729 00:45:07.112416  1318 solver.cpp:397]     Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0729 00:45:07.112419  1318 solver.cpp:397]     Test net output #2: loss2 = 87.3365
I0729 00:45:07.134907  1318 solver.cpp:330] Iteration 20, Testing net (#0)
I0729 00:45:07.134958  1318 net.cpp:676] Ignoring source layer data
I0729 00:45:07.134965  1318 net.cpp:676] Ignoring source layer label_data_1_split
I0729 00:45:07.223928  1325 data_layer.cpp:73] Restarting data prefetching from start.
I0729 00:45:07.226883  1318 solver.cpp:397]     Test net output #0: accuracy = 0
I0729 00:45:07.226912  1318 solver.cpp:397]     Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0729 00:45:07.226920  1318 solver.cpp:397]     Test net output #2: loss2 = 87.3365

与未编辑的lenet原型的caffe输出相比:

I0729 00:51:35.134851  1327 solver.cpp:397]     Test net output #0: accuracy = 0.0287
I0729 00:51:35.134869  1327 solver.cpp:397]     Test net output #1: loss = 4.71456 (* 1 = 4.71456 loss)
I0729 00:51:35.137794  1327 solver.cpp:218] Iteration 0 (1.68966e+31 iter/s, 0.0944882s/100 iters), loss = 4.70328
I0729 00:51:35.137809  1327 solver.cpp:237]     Train net output #0: loss = 4.70328 (* 1 = 4.70328 loss)
I0729 00:51:35.137821  1327 sgd_solver.cpp:105] Iteration 0, lr = 0.01
I0729 00:51:35.159623  1327 solver.cpp:330] Iteration 10, Testing net (#0)
I0729 00:51:35.159639  1327 net.cpp:676] Ignoring source layer data
I0729 00:51:35.249256  1334 data_layer.cpp:73] Restarting data prefetching from start.
I0729 00:51:35.252557  1327 solver.cpp:397]     Test net output #0: accuracy = 0.3437
I0729 00:51:35.252575  1327 solver.cpp:397]     Test net output #1: loss = 1.76758 (* 1 = 1.76758 loss)
I0729 00:51:35.275521  1327 solver.cpp:330] Iteration 20, Testing net (#0)
I0729 00:51:35.275537  1327 net.cpp:676] Ignoring source layer data
I0729 00:51:35.361196  1334 data_layer.cpp:73] Restarting data prefetching from start.
I0729 00:51:35.364228  1327 solver.cpp:397]     Test net output #0: accuracy = 0.7281
I0729 00:51:35.364244  1327 solver.cpp:397]     Test net output #1: loss = 0.926002 (* 1 = 0.926002 loss)
I0729 00:51:35.386831  1327 solver.cpp:330] Iteration 30, Testing net (#0)
I0729 00:51:35.386847  1327 net.cpp:676] Ignoring source layer data
I0729 00:51:35.468611  1334 data_layer.cpp:73] Restarting data prefetching from start.
I0729 00:51:35.471771  1327 solver.cpp:397]     Test net output #0: accuracy = 0.7942
I0729 00:51:35.471804  1327 solver.cpp:397]     Test net output #1: loss = 0.654145 (* 1 = 0.654145 loss)

是否有人能够复制错误或解释发生了什么以及如何解决此问题?

谢谢!

0 个答案:

没有答案