Caffe未知的底部blob

时间:2016-08-02 21:42:52

标签: machine-learning neural-network deep-learning caffe

我正在使用caffe框架,我想训练下一个网络:

train2.prototxt

当我执行下一个命令时:

caffe train --solver solver.prototxt

它抛出的错误:

`F0802 14:31:54.506695 28038 insert_splits.cpp:29] Unknown bottom blob 'image' (layer 'conv1', bottom index 0)
*** Check failure stack trace: ***
@     0x7ff2941c3f9d  google::LogMessage::Fail()
@     0x7ff2941c5e03  google::LogMessage::SendToLog()
@     0x7ff2941c3b2b  google::LogMessage::Flush()
@     0x7ff2941c67ee  google::LogMessageFatal::~LogMessageFatal()
@     0x7ff2947cedbe  caffe::InsertSplits()
@     0x7ff2948306de  caffe::Net<>::Init()
@     0x7ff294833a81  caffe::Net<>::Net()
@     0x7ff29480ce6a  caffe::Solver<>::InitTestNets()
@     0x7ff29480ee85  caffe::Solver<>::Init()
@     0x7ff29480f19a  caffe::Solver<>::Solver()
@     0x7ff2947f4343  caffe::Creator_SGDSolver<>()
@           0x40b1a0  (unknown)
@           0x407373  (unknown)
@     0x7ff292e40741  __libc_start_main
@           0x407b79  (unknown)
Abortado (`core' generado)

代码是(train2.prototxt):

name: "xxxxxx"
layer {
  name: "image"
  type: "HDF5Data"
  top: "image"
  top: "label"
  hdf5_data_param {
    source: "h5a.train.h5.txt"
    batch_size: 64
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "image"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool1"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv3"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "improd3"
  type: "InnerProduct"
  bottom: "pool2"
  top: "improd3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "improd3"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "improd3"
  bottom: "label"
  top: "loss"
}

solver.prototxt:

net: "train2.prototxt"
test_iter: 100
test_interval: 1000
# lr for fine-tuning should be lower than when starting from scratch
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
# stepsize should also be lower, as we're closer to being done
stepsize: 20000
display: 20
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "caffe"
solver_mode: CPU

我被困住了,因为这个问题我无法启动网络培训。

1 个答案:

答案 0 :(得分:2)

这是因为,即使您尝试执行Train阶段,Test阶段也将用于验证。由于测试阶段没有输入数据层,conv1图层无法找到输入blob image。正在调用Test阶段,因为您已在解算器中定义了test_*个参数,在train2.prototxt中的某些层中定义了phase: TEST。从求解器和代表TEST阶段的图层中删除上述参数将帮助您顺利运行培训。