我正在使用caffe框架,我想训练下一个网络:
当我执行下一个命令时:
caffe train --solver solver.prototxt
它抛出的错误:
`F0802 14:31:54.506695 28038 insert_splits.cpp:29] Unknown bottom blob 'image' (layer 'conv1', bottom index 0)
*** Check failure stack trace: ***
@ 0x7ff2941c3f9d google::LogMessage::Fail()
@ 0x7ff2941c5e03 google::LogMessage::SendToLog()
@ 0x7ff2941c3b2b google::LogMessage::Flush()
@ 0x7ff2941c67ee google::LogMessageFatal::~LogMessageFatal()
@ 0x7ff2947cedbe caffe::InsertSplits()
@ 0x7ff2948306de caffe::Net<>::Init()
@ 0x7ff294833a81 caffe::Net<>::Net()
@ 0x7ff29480ce6a caffe::Solver<>::InitTestNets()
@ 0x7ff29480ee85 caffe::Solver<>::Init()
@ 0x7ff29480f19a caffe::Solver<>::Solver()
@ 0x7ff2947f4343 caffe::Creator_SGDSolver<>()
@ 0x40b1a0 (unknown)
@ 0x407373 (unknown)
@ 0x7ff292e40741 __libc_start_main
@ 0x407b79 (unknown)
Abortado (`core' generado)
代码是(train2.prototxt):
name: "xxxxxx"
layer {
name: "image"
type: "HDF5Data"
top: "image"
top: "label"
hdf5_data_param {
source: "h5a.train.h5.txt"
batch_size: 64
}
include {
phase: TRAIN
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "image"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool1"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv3"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "improd3"
type: "InnerProduct"
bottom: "pool2"
top: "improd3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "improd3"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "improd3"
bottom: "label"
top: "loss"
}
solver.prototxt:
net: "train2.prototxt"
test_iter: 100
test_interval: 1000
# lr for fine-tuning should be lower than when starting from scratch
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
# stepsize should also be lower, as we're closer to being done
stepsize: 20000
display: 20
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "caffe"
solver_mode: CPU
我被困住了,因为这个问题我无法启动网络培训。
答案 0 :(得分:2)
这是因为,即使您尝试执行Train
阶段,Test
阶段也将用于验证。由于测试阶段没有输入数据层,conv1
图层无法找到输入blob image
。正在调用Test
阶段,因为您已在解算器中定义了test_*
个参数,在train2.prototxt中的某些层中定义了phase: TEST
。从求解器和代表TEST
阶段的图层中删除上述参数将帮助您顺利运行培训。