为什么在训练FCN-8期间损失保持不变?

时间:2017-01-12 10:46:16

标签: deep-learning caffe pycaffe deeplearning4j matconvnet

我正在尝试运行FCN-8。我做了以下步骤: 1.已下载this repository 2.将我的数据转换为LMDB并更改train_val.prototxt中的路径 3.下载fcn8s-heavy-pascal caffemodel 4.将number_of_output中的train_val.prototxtdeploy.prototxt 60 更改为 5 (我的数据中的类数)最后几层:

layer {
  name: "score59"
  type: "Convolution"
  bottom: "fc7"
  top: "score59"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 5 #60
    kernel_size: 1
    engine: CAFFE
  }
}
layer {
  name: "upscore2"
  type: "Deconvolution"
  bottom: "score59"
  top: "upscore2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  convolution_param {
    num_output: 5 #60
    bias_term: false
    kernel_size: 4
    stride: 2
  }
}
layer {
  name: "score-pool4"
  type: "Convolution"
  bottom: "pool4"
  top: "score-pool4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 5 #60
    kernel_size: 1
    engine: CAFFE
  }
}
layer { type: 'Crop' name: 'crop' bottom: 'score-pool4' bottom: 'upscore2'
  top: 'score-pool4c' }
layer {
  name: "fuse"
  type: "Eltwise"
  bottom: "upscore2"
  bottom: "score-pool4c"
  top: "score-fused"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "upsample-fused-16"
  type: "Deconvolution"
  bottom: "score-fused"
  top: "score4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  convolution_param {
    num_output: 5 #60
    bias_term: false
    kernel_size: 4
    stride: 2
  }
}
layer {
  name: "score-pool3"
  type: "Convolution"
  bottom: "pool3"
  top: "score-pool3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 5 #60
    kernel_size: 1
    engine: CAFFE
  }
}
layer { type: 'Crop' name: 'crop' bottom: 'score-pool3' bottom: 'score4'
  top: 'score-pool3c' }
layer {
  name: "fuse"
  type: "Eltwise"
  bottom: "score4"
  bottom: "score-pool3c"
  top: "score-final"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "upsample"
  type: "Deconvolution"
  bottom: "score-final"
  top: "bigscore"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 5 #60
    bias_term: false
    kernel_size: 16
    stride: 8
  }
}
layer { type: 'Crop' name: 'crop' bottom: 'bigscore' bottom: 'data' top: 'score' }
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "score"
  bottom: "label"
  top: "loss"
  loss_param {
    normalize: false
  }
}

我开始使用预先训练的pascal数据集模型的权重进行训练。但随着时间的推移,损失仍然是恒定的(损失= 105476)。

0112 18:25:07.198588  5878 sgd_solver.cpp:106] Iteration 150, lr = 1e-14
I0112 18:26:07.614239  5878 solver.cpp:228] Iteration 200, loss = 105476
I0112 18:26:07.614459  5878 solver.cpp:244]     Train net output #0: loss = 105476 (* 1 = 105476 loss)
I0112 18:26:07.614490  5878 sgd_solver.cpp:106] Iteration 200, lr = 1e-14
I0112 18:27:06.198556  5878 solver.cpp:228] Iteration 250, loss = 105476
I0112 18:27:06.198801  5878 solver.cpp:244]     Train net output #0: loss = 105476 (* 1 = 105476 loss)
I0112 18:27:06.198834  5878 sgd_solver.cpp:106] Iteration 250, lr = 1e-14
I0112 18:28:05.056469  5878 solver.cpp:228] Iteration 300, loss = 105476
I0112 18:28:05.056715  5878 solver.cpp:244]     Train net output #0: loss = 105476 (* 1 = 105476 loss)
I0112 18:28:05.056751  5878 sgd_solver.cpp:106] Iteration 300, lr = 1e-14
I0112 18:29:04.537042  5878 solver.cpp:228] Iteration 350, loss = 105476
I0112 18:29:04.537261  5878 solver.cpp:244]     Train net output #0: loss = 105476 (* 1 = 105476 loss)
I0112 18:29:04.537293  5878 sgd_solver.cpp:106] Iteration 350, lr = 1e-14
I0112 18:30:05.320504  5878 solver.cpp:228] Iteration 400, loss = 105476
I0112 18:30:05.320751  5878 solver.cpp:244]     Train net output #0: loss = 105476 (* 1 = 105476 loss)
I0112 18:30:05.320796  5878 sgd_solver.cpp:106] Iteration 400, lr = 1e-14
I0112 18:31:06.690937  5878 solver.cpp:228] Iteration 450, loss = 105476
I0112 18:31:06.691177  5878 solver.cpp:244]     Train net output #0: loss = 105476 (* 1 = 105476 loss)
I0112 18:31:06.691207  5878 sgd_solver.cpp:106] Iteration 450, lr = 1e-14
I0112 18:32:06.593940  5878 solver.cpp:228] Iteration 500, loss = 105476
I0112 18:32:06.596643  5878 solver.cpp:244]     Train net output #0: loss = 105476 (* 1 = 105476 loss)
I0112 18:32:06.596701  5878 sgd_solver.cpp:106] Iteration 500, lr = 1e-14

我不知道我做错了哪一部分。非常感谢您帮助解决这个问题。

1 个答案:

答案 0 :(得分:1)

  • 您是否使用surgery.transplant()中名为solve.py的功能将原始网络的caffemodel移植到您当前的网络中?

  • 您是否在解除卷积层上添加了weight-fillerbias-filler,其初始值为net.py

  • 执行这两个步骤后,您是否执行了net.py以生成更新的图层?

检查这些步骤,看看会发生什么。