Question

我想训练一个模型（VGG + LSTM）以执行视频序列中的重新识别任务。我的代码的主要问题是，在训练了模型之后，我想进行推断，但是LSTM似乎不起作用。（我向模型连续输入了9张图片，但是每次在lstm之后输出相同的矢量，所以我相信有些错误）。

听是我的train.prototxt。简而言之，我的lstm层是

layer {
    name: "lstm"
    type: "LSTM"
    bottom: "Appearance_reshape"
    bottom: "cont"
    top: "lstm_out"
    recurrent_param {
    num_output: 128
        weight_filler {
        type: "gaussian"
        std: 0.005
        }
        bias_filler {
            type: "constant"
            value: 0.1
        }
    }
}

底部"Appearance_reshape"的形状为9 * 1 * 500，cont的形状为9 * 1。并且cont的值为（0,1,1，...，1），表示序列长度为9。

这是我的lstm_deploy.prototxt。简而言之，重要的层次如下。

layer {
    bottom: "fc7"
    top: "Appearance_vector"
    name: "Appearance_vector"
    type: "InnerProduct"
    inner_product_param {
        num_output: 500
    }
}
layer {
    bottom: "Appearance_vector"
    top: "Appearance_vector"
    name: "Appearance_vector_relu"
    type: "ReLU"
}
layer {
    name: "reshape"
    type: "Reshape"
    bottom: "Appearance_vector"
    top: "Appearance_reshape"
    reshape_param {
        shape: { dim: 0 dim: 1 dim: -1 }
    }
}
layer {
    name: "lstm"
    type: "LSTM"
    bottom: "Appearance_reshape"
    bottom: "cont"
    top: "lstm_out"
    recurrent_param {
        num_output: 128
    }
}
layer {
    name: "flatten"
    type: "Flatten"
    bottom: "lstm_out"
    top: "lstm_Y_flatten"
}

输出层为"flatten"。

测试的来源inferance.cpp，对不起，这段代码很难看：（

但是，如您所见，我将同一图像推回输入向量并计算lstm_vector。

for(int i = 1; i < imgs.size(); i++) {
    tmp_img1.push_back(imgs[1]);
}
lstm_vector = model_lstm_net.calc(tmp_img1);

问题在于从模型中出来的所有向量都是相同的，但是我认为在使用lstm层之后，每个输出之间可能会有一些差异。代码有什么问题吗？还是只是训练数据不够好而在lstm层内部什么都没有学到？

如果您能帮助我，我真的很感激。

Answer 1

我的猜测是cont层中的值不正确，但是在切片之前不知道标签数据中的条目就很难分辨。

看看这个tutorial。

cont（在本教程中称为clipmask）对于每个序列中的第一个条目都必须为零，对于随后的每个条目都必须为一个。每当cont为零时，LSTM层就会忘记其内存并将其重置为初始值。

同时，LSTM在多个后续求解器步骤（向前/向后循环）上保留其内存，因此每个时间序列中的第一项必须将其设置为零。

caffe lstm层不起作用

1 个答案: