Question

我很难用matlab实现LSTM推理代码。我使用theano和烤宽面条从训练有素的模型中减轻了重量。然后，我使用引入的权重和偏差编码LSTM推理（测试）代码。

实施的推理代码的丢失显示大约5，但theano和lasagne的测试损失显示大约1.5。

我认为实现的推理matlab代码是正确的，代码是这样的。

for i=1:100
input1 = input(i,:);
input_gate=activation_sigmoid(mtimes(input1,W_in_to_ingate)+mtimes(hidden,W_hid_to_ingate)+reshape(b_ingate,1,512));
forget=activation_sigmoid(mtimes(input1,W_in_to_forgetgate)+mtimes(hidden,W_hid_to_forgetgate)+reshape(b_forgetgate,1,512));
output=activation_sigmoid(mtimes(input1,W_in_to_outgate)+mtimes(hidden,W_hid_to_outgate)+reshape(b_outgate,1,512));
cell=activation_tanh(mtimes(input1,W_in_to_cell)+mtimes(hidden,W_hid_to_cell)+reshape(b_cell,1,512));

ct=input_gate.*cell+forget.*ct;
hidden=activation_tanh(ct).*output;

result=softmax(reshape(mtimes(hidden,W)+reshape(b,1,83),83,1));
new_loss=-(log(result(answer(1,i)+1)));
loss = loss+new_loss;
end
loss=loss/100;

与lasagne LSTM源代码相比，我不知道哪个部分不正确。使用theano和lasagne的代码如下所述。

l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size))
l_forward_2 = lab.LSTMLayer(
            l_in, 
            num_units=N_HIDDEN,
            grad_clipping=GRAD_CLIP,
            peepholes=False,
            nonlinearity=activation, 
            method=method)   ### batch_size*SEQ_LENGTH*N_HIDDEN

l_shp = lasagne.layers.ReshapeLayer(l_forward_2, (-1, N_HIDDEN))  ## (batch_size*SEQ_LENGTH, N_HIDDEN)
l_out = lasagne.layers.DenseLayer(l_shp, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)
batchsize, seqlen, _ = l_in.input_var.shape
l_shp1 = lasagne.layers.ReshapeLayer(l_out, (batchsize, seqlen, vocab_size))
l_out1 = lasagne.layers.SliceLayer(l_shp1, -1, 1)
test_output = lasagne.layers.get_output(l_out, deterministic=True)
test_loss = T.nnet.categorical_crossentropy(test_output,target.flatten()).mean()
val_fn = theano.function([l_in.input_var, target], test_loss, allow_input_downcast=True)

LSTM图层和目标标签的输入是这样的。

    def gen_data(pp, batch_size,SEQ_LENGTH, data, return_target=True):

    x = np.zeros((batch_size,SEQ_LENGTH,vocab_size))   ###### 128*100*85
    y = np.zeros((batch_size, SEQ_LENGTH))

    for n in range(batch_size):
        # ptr = n
        for i in range(SEQ_LENGTH):
            x[n,i,char_to_ix[data[pp[n]*SEQ_LENGTH+i]]] = 1.
            y[n,i] = char_to_ix[data[pp[n]*SEQ_LENGTH+i+1]]
    return x, np.array(y,dtype='int32')

我一直在努力解决这个问题一个月。我真的需要你的帮助。

谢谢。

与使用千层面相比，实施的LSTM推理代码显示出不同的测试损失。

0 个答案: