咖啡损失不会减少

时间:2017-08-31 21:14:30

标签: machine-learning computer-vision deep-learning caffe

我是caffe的新用户,我基本上对FCN模型进行了少量修改,以便对我自己的数据进行培训。我注意到经过680次迭代后,损失没有改变。我想也许是因为我在像素上应用了1/255的比例,但我已经删除了它并且没有任何变化。

我的数据在LMDB中(1个LMDB用于训练图像,1个LMDB用于训练标签,1个用于验证,1个用于验证标签),标签0和1存储为uint8。

有人有任何建议吗?

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<h1 id="main-heading">

</h1>

这是我对培训阶段网络的定义:

I0830 23:05:45.645638 2989601728 solver.cpp:218] Iteration 0 (0 iter/s, 74.062s/20 iters), loss = 190732
I0830 23:05:45.647449 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:05:45.647469 2989601728 sgd_solver.cpp:105] Iteration 0, lr = 1e-14
I0830 23:28:42.183948 2989601728 solver.cpp:218] Iteration 20 (0.0145293 iter/s, 1376.53s/20 iters), loss = 190732
I0830 23:28:42.185940 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:28:42.185962 2989601728 sgd_solver.cpp:105] Iteration 20, lr = 1e-14
I0830 23:51:43.803419 2989601728 solver.cpp:218] Iteration 40 (0.0144758 iter/s, 1381.62s/20 iters), loss = 190732
I0830 23:51:43.817291 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:51:43.817371 2989601728 sgd_solver.cpp:105] Iteration 40, lr = 1e-14
I0831 00:17:23.955076 2989601728 solver.cpp:218] Iteration 60 (0.0129858 iter/s, 1540.14s/20 iters), loss = 190732
I0831 00:17:23.957161 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 00:17:23.957203 2989601728 sgd_solver.cpp:105] Iteration 60, lr = 1e-14
I0831 00:40:41.079898 2989601728 solver.cpp:218] Iteration 80 (0.0143152 iter/s, 1397.12s/20 iters), loss = 190732
I0831 00:40:41.082603 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 00:40:41.082649 2989601728 sgd_solver.cpp:105] Iteration 80, lr = 1e-14
I0831 01:03:53.159317 2989601728 solver.cpp:218] Iteration 100 (0.014367 iter/s, 1392.08s/20 iters), loss = 190732
I0831 01:03:53.161844 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:03:53.161903 2989601728 sgd_solver.cpp:105] Iteration 100, lr = 1e-14
I0831 01:27:03.867575 2989601728 solver.cpp:218] Iteration 120 (0.0143812 iter/s, 1390.71s/20 iters), loss = 190732
I0831 01:27:03.869439 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:27:03.869469 2989601728 sgd_solver.cpp:105] Iteration 120, lr = 1e-14
I0831 01:50:10.512094 2989601728 solver.cpp:218] Iteration 140 (0.0144233 iter/s, 1386.64s/20 iters), loss = 190732
I0831 01:50:10.514268 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:50:10.514302 2989601728 sgd_solver.cpp:105] Iteration 140, lr = 1e-14
I0831 02:09:50.607455 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 02:09:50.672649 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 02:13:16.209158 2989601728 solver.cpp:218] Iteration 160 (0.0144332 iter/s, 1385.69s/20 iters), loss = 190732
I0831 02:13:16.211565 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:13:16.211609 2989601728 sgd_solver.cpp:105] Iteration 160, lr = 1e-14
I0831 02:36:30.536650 2989601728 solver.cpp:218] Iteration 180 (0.0143439 iter/s, 1394.32s/20 iters), loss = 190732
I0831 02:36:30.538833 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:36:30.539871 2989601728 sgd_solver.cpp:105] Iteration 180, lr = 1e-14
I0831 02:59:38.813151 2989601728 solver.cpp:218] Iteration 200 (0.0144064 iter/s, 1388.27s/20 iters), loss = 190732
I0831 02:59:38.814018 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:59:38.814097 2989601728 sgd_solver.cpp:105] Iteration 200, lr = 1e-14
I0831 03:22:46.534659 2989601728 solver.cpp:218] Iteration 220 (0.0144121 iter/s, 1387.72s/20 iters), loss = 190732
I0831 03:22:46.536751 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 03:22:46.536808 2989601728 sgd_solver.cpp:105] Iteration 220, lr = 1e-14
I0831 03:46:38.997651 2989601728 solver.cpp:218] Iteration 240 (0.013962 iter/s, 1432.46s/20 iters), loss = 190732
I0831 03:46:39.001502 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 03:46:39.001591 2989601728 sgd_solver.cpp:105] Iteration 240, lr = 1e-14
I0831 04:09:49.981889 2989601728 solver.cpp:218] Iteration 260 (0.0143784 iter/s, 1390.98s/20 iters), loss = 190732
I0831 04:09:49.983256 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:09:49.983301 2989601728 sgd_solver.cpp:105] Iteration 260, lr = 1e-14
I0831 04:32:59.845221 2989601728 solver.cpp:218] Iteration 280 (0.0143899 iter/s, 1389.86s/20 iters), loss = 190732
I0831 04:32:59.847712 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:32:59.847936 2989601728 sgd_solver.cpp:105] Iteration 280, lr = 1e-14
I0831 04:56:07.752025 2989601728 solver.cpp:218] Iteration 300 (0.0144102 iter/s, 1387.9s/20 iters), loss = 190732
I0831 04:56:07.754050 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:56:07.754091 2989601728 sgd_solver.cpp:105] Iteration 300, lr = 1e-14
I0831 05:16:57.383947 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 05:16:57.468634 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 05:19:16.101671 2989601728 solver.cpp:218] Iteration 320 (0.0144056 iter/s, 1388.35s/20 iters), loss = 190732
I0831 05:19:16.102998 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 05:19:16.103953 2989601728 sgd_solver.cpp:105] Iteration 320, lr = 1e-14
I0831 05:42:22.554265 2989601728 solver.cpp:218] Iteration 340 (0.0144253 iter/s, 1386.45s/20 iters), loss = 190732
I0831 05:42:22.557201 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 05:42:22.558081 2989601728 sgd_solver.cpp:105] Iteration 340, lr = 1e-14
I0831 06:05:33.816596 2989601728 solver.cpp:218] Iteration 360 (0.0143755 iter/s, 1391.26s/20 iters), loss = 190732
I0831 06:05:33.819310 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:05:33.819358 2989601728 sgd_solver.cpp:105] Iteration 360, lr = 1e-14
I0831 06:28:38.358750 2989601728 solver.cpp:218] Iteration 380 (0.0144452 iter/s, 1384.54s/20 iters), loss = 190732
I0831 06:28:38.362834 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:28:38.363451 2989601728 sgd_solver.cpp:105] Iteration 380, lr = 1e-14
I0831 06:51:48.489392 2989601728 solver.cpp:218] Iteration 400 (0.0143872 iter/s, 1390.13s/20 iters), loss = 190732
I0831 06:51:48.490061 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:51:48.491013 2989601728 sgd_solver.cpp:105] Iteration 400, lr = 1e-14
I0831 07:15:00.156152 2989601728 solver.cpp:218] Iteration 420 (0.0143713 iter/s, 1391.67s/20 iters), loss = 190732
I0831 07:15:00.159214 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 07:15:00.159261 2989601728 sgd_solver.cpp:105] Iteration 420, lr = 1e-14
I0831 07:38:09.862089 2989601728 solver.cpp:218] Iteration 440 (0.0143916 iter/s, 1389.7s/20 iters), loss = 190732
I0831 07:38:09.865105 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 07:38:09.865152 2989601728 sgd_solver.cpp:105] Iteration 440, lr = 1e-14
I0831 08:01:15.438222 2989601728 solver.cpp:218] Iteration 460 (0.0144345 iter/s, 1385.57s/20 iters), loss = 190732
I0831 08:01:15.439589 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:01:15.440675 2989601728 sgd_solver.cpp:105] Iteration 460, lr = 1e-14
I0831 08:24:24.188830 2989601728 solver.cpp:218] Iteration 480 (0.0144015 iter/s, 1388.75s/20 iters), loss = 190732
I0831 08:24:24.191907 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:24:24.191951 2989601728 sgd_solver.cpp:105] Iteration 480, lr = 1e-14
I0831 08:24:24.514991 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 08:24:24.524113 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 08:47:29.558264 2989601728 solver.cpp:218] Iteration 500 (0.0144366 iter/s, 1385.37s/20 iters), loss = 190732
I0831 08:47:29.562070 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:47:29.562104 2989601728 sgd_solver.cpp:105] Iteration 500, lr = 1e-14
I0831 09:10:43.430681 2989601728 solver.cpp:218] Iteration 520 (0.0143486 iter/s, 1393.87s/20 iters), loss = 190732
I0831 09:10:43.432601 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:10:43.433498 2989601728 sgd_solver.cpp:105] Iteration 520, lr = 1e-14
I0831 09:33:53.022397 2989601728 solver.cpp:218] Iteration 540 (0.0143927 iter/s, 1389.59s/20 iters), loss = 190732
I0831 09:33:53.024354 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:33:53.024405 2989601728 sgd_solver.cpp:105] Iteration 540, lr = 1e-14
I0831 09:56:59.140298 2989601728 solver.cpp:218] Iteration 560 (0.0144288 iter/s, 1386.11s/20 iters), loss = 190732
I0831 09:56:59.142597 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:56:59.142642 2989601728 sgd_solver.cpp:105] Iteration 560, lr = 1e-14
I0831 10:20:10.334044 2989601728 solver.cpp:218] Iteration 580 (0.0143762 iter/s, 1391.19s/20 iters), loss = 190732
I0831 10:20:10.336256 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 10:20:10.336287 2989601728 sgd_solver.cpp:105] Iteration 580, lr = 1e-14
I0831 10:43:15.363580 2989601728 solver.cpp:218] Iteration 600 (0.0144402 iter/s, 1385.03s/20 iters), loss = 190732
I0831 10:43:15.365350 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 10:43:15.365380 2989601728 sgd_solver.cpp:105] Iteration 600, lr = 1e-14
I0831 11:06:26.864280 2989601728 solver.cpp:218] Iteration 620 (0.014373 iter/s, 1391.5s/20 iters), loss = 190732
I0831 11:06:26.867431 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:06:26.867480 2989601728 sgd_solver.cpp:105] Iteration 620, lr = 1e-14
I0831 11:29:37.275745 2989601728 solver.cpp:218] Iteration 640 (0.0143843 iter/s, 1390.41s/20 iters), loss = 190732
I0831 11:29:37.277166 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:29:37.277206 2989601728 sgd_solver.cpp:105] Iteration 640, lr = 1e-14
I0831 11:30:47.900959 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 11:30:47.934394 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 11:53:00.394335 2989601728 solver.cpp:218] Iteration 660 (0.014254 iter/s, 1403.11s/20 iters), loss = 190732
I0831 11:53:00.399102 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:53:00.399185 2989601728 sgd_solver.cpp:105] Iteration 660, lr = 1e-14
I0831 12:16:24.352802 2989601728 solver.cpp:218] Iteration 680 (0.0142455 iter/s, 1403.95s/20 iters), loss = 190732
I0831 12:16:24.355890 2989601728 solver.cpp:237]     Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 12:16:24.356781 2989601728 sgd_solver.cpp:105] Iteration 680, lr = 1e-14

这是我的solver.prototxt:

name: "face-detect"
state {
  phase: TRAIN
  level: 0
  stage: ""
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_value: 104.006989
    mean_value: 116.66877
    mean_value: 122.678917
  }
  data_param {
    source: "data/fddb-face-database/train_img_lmdb"
    scale: 0.00390625
    batch_size: 16
    backend: LMDB
  }
}
layer {
  name: "label"
  type: "Data"
  top: "label"
  include {
    phase: TRAIN
  }
  data_param {
    source: "data/fddb-face-database/train_lab_lmdb"
    batch_size: 16
    backend: LMDB
  }
}
layer {
  name: "mod1_conv1"
  type: "Convolution"
  bottom: "data"
  top: "mod1_conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod1_relu1"
  type: "ReLU"
  bottom: "mod1_conv1"
  top: "mod1_conv1"
}
layer {
  name: "mod1_conv2"
  type: "Convolution"
  bottom: "mod1_conv1"
  top: "mod1_conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod1_relu2"
  type: "ReLU"
  bottom: "mod1_conv2"
  top: "mod1_conv2"
}
layer {
  name: "mod1_pool1"
  type: "Pooling"
  bottom: "mod1_conv2"
  top: "mod1_pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "mod2_conv1"
  type: "Convolution"
  bottom: "mod1_pool1"
  top: "mod2_conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod2_relu1"
  type: "ReLU"
  bottom: "mod2_conv1"
  top: "mod2_conv1"
}
layer {
  name: "mod2_conv2"
  type: "Convolution"
  bottom: "mod2_conv1"
  top: "mod2_conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod2_relu2"
  type: "ReLU"
  bottom: "mod2_conv2"
  top: "mod2_conv2"
}
layer {
  name: "mod2_pool1"
  type: "Pooling"
  bottom: "mod2_conv2"
  top: "mod2_pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "mod3_conv1"
  type: "Convolution"
  bottom: "mod2_pool1"
  top: "mod3_conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod3_relu1"
  type: "ReLU"
  bottom: "mod3_conv1"
  top: "mod3_conv1"
}
layer {
  name: "mod3_conv2"
  type: "Convolution"
  bottom: "mod3_conv1"
  top: "mod3_conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod3_relu2"
  type: "ReLU"
  bottom: "mod3_conv2"
  top: "mod3_conv2"
}
layer {
  name: "mod3_pool1"
  type: "Pooling"
  bottom: "mod3_conv2"
  top: "mod3_pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "mod4_conv1"
  type: "Convolution"
  bottom: "mod3_pool1"
  top: "mod4_conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod4_relu1"
  type: "ReLU"
  bottom: "mod4_conv1"
  top: "mod4_conv1"
}
layer {
  name: "mod4_conv2"
  type: "Convolution"
  bottom: "mod4_conv1"
  top: "mod4_conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod4_relu2"
  type: "ReLU"
  bottom: "mod4_conv2"
  top: "mod4_conv2"
}
layer {
  name: "mod4_pool1"
  type: "Pooling"
  bottom: "mod4_conv2"
  top: "mod4_pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "mod5_conv1"
  type: "Convolution"
  bottom: "mod4_pool1"
  top: "mod5_conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod5_relu1"
  type: "ReLU"
  bottom: "mod5_conv1"
  top: "mod5_conv1"
}
layer {
  name: "mod5_conv2"
  type: "Convolution"
  bottom: "mod5_conv1"
  top: "mod5_conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "mod5_relu2"
  type: "ReLU"
  bottom: "mod5_conv2"
  top: "mod5_conv2"
}
layer {
  name: "mod5_pool1"
  type: "Pooling"
  bottom: "mod5_conv2"
  top: "mod5_pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "mod6_fc1"
  type: "Convolution"
  bottom: "mod5_pool1"
  top: "mod6_fc1"
  convolution_param {
    num_output: 4096
    pad: 0
    kernel_size: 1
    stride: 1
  }
}
layer {
  name: "mod6_relu1"
  type: "ReLU"
  bottom: "mod6_fc1"
  top: "mod6_fc1"
}
layer {
  name: "mod6_drop1"
  type: "Dropout"
  bottom: "mod6_fc1"
  top: "mod6_fc1"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "mod6_score1"
  type: "Convolution"
  bottom: "mod6_fc1"
  top: "mod6_score1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 2
    pad: 0
    kernel_size: 1
  }
}
layer {
  name: "mod6_upscore1"
  type: "Deconvolution"
  bottom: "mod6_score1"
  top: "mod6_upscore1"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 2
    bias_term: false
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "mod6_score2"
  type: "Convolution"
  bottom: "mod4_pool1"
  top: "mod6_score2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 2
    pad: 0
    kernel_size: 1
  }
}
layer {
  name: "crop"
  type: "Crop"
  bottom: "mod6_score2"
  bottom: "mod6_upscore1"
  top: "mod6_score2c"
}
layer {
  name: "mod6_fuse1"
  type: "Eltwise"
  bottom: "mod6_upscore1"
  bottom: "mod6_score2c"
  top: "mod6_fuse1"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "mod6_upfuse1"
  type: "Deconvolution"
  bottom: "mod6_fuse1"
  top: "mod6_upfuse1"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 2
    bias_term: false
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "mod6_score3"
  type: "Convolution"
  bottom: "mod3_pool1"
  top: "mod6_score3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 2
    pad: 0
    kernel_size: 1
  }
}
layer {
  name: "crop"
  type: "Crop"
  bottom: "mod6_score3"
  bottom: "mod6_upfuse1"
  top: "mod6_score3c"
}
layer {
  name: "mod6_fuse2"
  type: "Eltwise"
  bottom: "mod6_upfuse1"
  bottom: "mod6_score3c"
  top: "mod6_fuse2"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "mod6_upfuse2"
  type: "Deconvolution"
  bottom: "mod6_fuse2"
  top: "mod6_upfuse2"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 2
    bias_term: false
    kernel_size: 8
    stride: 8
  }
}
layer {
  name: "crop"
  type: "Crop"
  bottom: "mod6_upfuse2"
  bottom: "label"
  top: "score"
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "score"
  bottom: "label"
  top: "loss"
  loss_param {
    normalize: false
  }
}

以下是我准备LMDB的方法:

net: "models/face-detect/train_val.prototxt"
test_iter: 736
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for unnormalized softmax
base_lr: 1e-14
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "models/face-detect/snapshot/train"
test_initialization: false
# Uncomment the following to default to CPU mode solving
solver_mode: CPU

2 个答案:

答案 0 :(得分:1)

您的base_lr似乎太小了。所以你的体重不会得到足够快的更新。您应该以{{1​​}} base_lr开头。学习率乘以损耗梯度并用于更新权重。如果学习速率太小,则更新将非常小并且收敛将太慢。学习率太高会给你带来不稳定的结果。没有神奇的数字,所以你必须根据经验为你的数据和网络找到合适的超参数。

答案 1 :(得分:-1)

您还应该尝试学习降息。我最喜欢的是GoogleLeNet中使用的持续学习率衰减,其中我们每8个时期将学习率降低4%。衰减的学习速率有助于融合,因为它试图通过降低更新容量来保留更多信息。这意味着您的网络不会忘记它已经学到的内容。

之后总是使用基于动量的优化器,如Adam或RMSprop。它们大大减少了学习中的紧张情绪,并确保顺利进入最低标准。