我很难用matlab实现LSTM推理代码。我使用theano和烤宽面条从训练有素的模型中减轻了重量。然后,我使用引入的权重和偏差编码LSTM推理(测试)代码。
实施的推理代码的丢失显示大约5,但theano和lasagne的测试损失显示大约1.5。
我认为实现的推理matlab代码是正确的,代码是这样的。
for i=1:100
input1 = input(i,:);
input_gate=activation_sigmoid(mtimes(input1,W_in_to_ingate)+mtimes(hidden,W_hid_to_ingate)+reshape(b_ingate,1,512));
forget=activation_sigmoid(mtimes(input1,W_in_to_forgetgate)+mtimes(hidden,W_hid_to_forgetgate)+reshape(b_forgetgate,1,512));
output=activation_sigmoid(mtimes(input1,W_in_to_outgate)+mtimes(hidden,W_hid_to_outgate)+reshape(b_outgate,1,512));
cell=activation_tanh(mtimes(input1,W_in_to_cell)+mtimes(hidden,W_hid_to_cell)+reshape(b_cell,1,512));
ct=input_gate.*cell+forget.*ct;
hidden=activation_tanh(ct).*output;
result=softmax(reshape(mtimes(hidden,W)+reshape(b,1,83),83,1));
new_loss=-(log(result(answer(1,i)+1)));
loss = loss+new_loss;
end
loss=loss/100;
与lasagne LSTM源代码相比,我不知道哪个部分不正确。 使用theano和lasagne的代码如下所述。
l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size))
l_forward_2 = lab.LSTMLayer(
l_in,
num_units=N_HIDDEN,
grad_clipping=GRAD_CLIP,
peepholes=False,
nonlinearity=activation,
method=method) ### batch_size*SEQ_LENGTH*N_HIDDEN
l_shp = lasagne.layers.ReshapeLayer(l_forward_2, (-1, N_HIDDEN)) ## (batch_size*SEQ_LENGTH, N_HIDDEN)
l_out = lasagne.layers.DenseLayer(l_shp, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)
batchsize, seqlen, _ = l_in.input_var.shape
l_shp1 = lasagne.layers.ReshapeLayer(l_out, (batchsize, seqlen, vocab_size))
l_out1 = lasagne.layers.SliceLayer(l_shp1, -1, 1)
test_output = lasagne.layers.get_output(l_out, deterministic=True)
test_loss = T.nnet.categorical_crossentropy(test_output,target.flatten()).mean()
val_fn = theano.function([l_in.input_var, target], test_loss, allow_input_downcast=True)
LSTM图层和目标标签的输入是这样的。
def gen_data(pp, batch_size,SEQ_LENGTH, data, return_target=True):
x = np.zeros((batch_size,SEQ_LENGTH,vocab_size)) ###### 128*100*85
y = np.zeros((batch_size, SEQ_LENGTH))
for n in range(batch_size):
# ptr = n
for i in range(SEQ_LENGTH):
x[n,i,char_to_ix[data[pp[n]*SEQ_LENGTH+i]]] = 1.
y[n,i] = char_to_ix[data[pp[n]*SEQ_LENGTH+i+1]]
return x, np.array(y,dtype='int32')
我一直在努力解决这个问题一个月。我真的需要你的帮助。
谢谢。