Caffe Python Layer中的后向传递没有被调用/工作?

时间:2016-11-11 02:08:27

标签: neural-network deep-learning caffe pycaffe

我尝试使用Caffe在Python中实现一个简单的丢失层是不成功的。作为参考,我发现在Python中实现了几个层,包括hereherehere

从Caffe文档/示例提供的EuclideanLossLayer开始,我无法使其正常工作并开始调试。即使使用这个简单的TestLayer

def setup(self, bottom, top):
    """
    Checks the correct number of bottom inputs.

    :param bottom: bottom inputs
    :type bottom: [numpy.ndarray]
    :param top: top outputs
    :type top: [numpy.ndarray]
    """

    print 'setup'

def reshape(self, bottom, top):
    """
    Make sure all involved blobs have the right dimension.

    :param bottom: bottom inputs
    :type bottom: caffe._caffe.RawBlobVec
    :param top: top outputs
    :type top: caffe._caffe.RawBlobVec
    """

    print 'reshape'
    top[0].reshape(bottom[0].data.shape[0], bottom[0].data.shape[1], bottom[0].data.shape[2], bottom[0].data.shape[3])

def forward(self, bottom, top):
    """
    Forward propagation.

    :param bottom: bottom inputs
    :type bottom: caffe._caffe.RawBlobVec
    :param top: top outputs
    :type top: caffe._caffe.RawBlobVec
    """

    print 'forward'
    top[0].data[...] = bottom[0].data

def backward(self, top, propagate_down, bottom):
    """
    Backward pass.

    :param bottom: bottom inputs
    :type bottom: caffe._caffe.RawBlobVec
    :param propagate_down:
    :type propagate_down:
    :param top: top outputs
    :type top: caffe._caffe.RawBlobVec
    """

    print 'backward'
    bottom[0].diff[...] = top[0].diff[...]

我无法让Python层工作。学习任务相当简单,因为我只是想预测一个实数值是正数还是负数。相应的数据生成如下并写入LMDB:

N = 10000
N_train = int(0.8*N)

images = []
labels = []

for n in range(N):            
    image = (numpy.random.rand(1, 1, 1)*2 - 1).astype(numpy.float)
    label = int(numpy.sign(image))

    images.append(image)
    labels.append(label)

将数据写入LMDB应该是正确的,因为使用Caffe提供的MNIST数据集的测试没有问题。网络定义如下:

 net.data, net.labels = caffe.layers.Data(batch_size = batch_size, backend = caffe.params.Data.LMDB, 
                                                source = lmdb_path, ntop = 2)
 net.fc1 = caffe.layers.Python(net.data, python_param = dict(module = 'tools.layers', layer = 'TestLayer'))
 net.score = caffe.layers.TanH(net.fc1)
 net.loss = caffe.layers.EuclideanLoss(net.score, net.labels)

使用以下方法手动完成解决:

for iteration in range(iterations):
    solver.step(step)

相应的原型文件如下:

solver.prototxt

weight_decay: 0.0005
test_net: "tests/test.prototxt"
snapshot_prefix: "tests/snapshot_"
max_iter: 1000
stepsize: 1000
base_lr: 0.01
snapshot: 0
gamma: 0.01
solver_mode: CPU
train_net: "tests/train.prototxt"
test_iter: 0
test_initialization: false
lr_policy: "step"
momentum: 0.9
display: 100
test_interval: 100000

train.prototxt

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "labels"
  data_param {
    source: "tests/train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "fc1"
  type: "Python"
  bottom: "data"
  top: "fc1"
  python_param {
    module: "tools.layers"
    layer: "TestLayer"
  }
}
layer {
  name: "score"
  type: "TanH"
  bottom: "fc1"
  top: "score"
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "score"
  bottom: "labels"
  top: "loss"
}

test.prototxt

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "labels"
  data_param {
    source: "tests/test_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "fc1"
  type: "Python"
  bottom: "data"
  top: "fc1"
  python_param {
    module: "tools.layers"
    layer: "TestLayer"
  }
}
layer {
  name: "score"
  type: "TanH"
  bottom: "fc1"
  top: "score"
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "score"
  bottom: "labels"
  top: "loss"
}

我尝试跟踪它,在backward的{​​{1}}和foward方法中添加调试消息,在解决过程中只调用TestLayer方法(请注意,不测试执行时,调用只能与解决方案相关联)。同样,我在forward中添加了调试消息:

python_layer.hpp

同样,只执行正向传递。当我删除virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { LOG(INFO) << "cpp forward"; self_.attr("forward")(bottom, top); } virtual void Backward_cpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) { LOG(INFO) << "cpp backward"; self_.attr("backward")(top, propagate_down, bottom); } 中的backward方法时,解决仍然有效。删除TestLayer方法时,由于未实现forward,因此会引发错误。我期望forward也一样,所以似乎后向传递根本没有被执行。切换回常规图层并添加调试消息,一切都按预期工作。

我觉得我错过了一些简单或基本的东西,但我现在几天都无法解决问题。所以任何帮助或提示都会受到赞赏。

谢谢!

3 个答案:

答案 0 :(得分:5)

这是预期的行为,因为您的python图层下方没有任何实际需要渐变来计算权重更新的图层。 Caffe注意到这一点并跳过这些层的反向计算,因为这会浪费时间。

如果在网络初始化时间日志中需要反向计算,则Caffe会打印所有图层。 在你的情况下,你应该看到类似的东西:

fc1 does not need backward computation.

如果在“Python”图层下面放置“InnerProduct”或“Convolution”图层(例如Data->InnerProduct->Python->Loss),则需要进行反向计算,并调用后向方法。

答案 1 :(得分:2)

Erik B.的答案外,您还可以通过指定

强制caffe为backprob
force_backward: true

在您的网络原型中。
有关详细信息,请参阅caffe.proto中的评论。

答案 2 :(得分:1)

即使我按照David Stutz的建议设置了force_backward: true,我也没有工作。我发现herehere我忘记在目标类的索引处将最后一层的差异设置为1。

正如Mohit Jain在他的咖啡用户回答中所描述的那样,如果你正在用虎斑猫进行ImageNet分类,那么在做了前进传球之后,你将不得不这样做:

net.blobs['prob'].diff[0][281] = 1   # 281 is tabby cat. diff shape: (1, 1000)

请注意,您必须相应地将'prob'更改为最后一个图层的名称,通常是softmax和'prob'

这是一个基于我的例子:

deploy.prototxt(它基于VGG16松散地显示文件的结构,但我没有测试它):

name: "smaller_vgg"
input: "data"
force_backward: true
input_dim: 1
input_dim: 3
input_dim: 224
input_dim: 224
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1_1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "pool1"
  top: "fc1"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "fc1"
  top: "fc1"
}
layer {
  name: "drop1"
  type: "Dropout"
  bottom: "fc1"
  top: "fc1"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "fc1"
  top: "fc2"
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc2"
  top: "prob"
}

main.py:

import caffe

prototxt = 'deploy.prototxt'
model_file = 'smaller_vgg.caffemodel'
net = caffe.Net(model_file, prototxt, caffe.TRAIN)  # not sure if TEST works as well

image = cv2.imread('tabbycat.jpg', cv2.IMREAD_UNCHANGED)

net.blobs['data'].data[...] = image[np.newaxis, np.newaxis, :]
net.blobs['prob'].diff[0, 298] = 1
net.forward()
backout = net.backward()

# access grad from backout['data'] or net.blobs['data'].diff