LSTM模型不会提高两次历次之间的准确性,而使用相同的函数遍历数据的logLinear模型则会提高

时间:2019-12-28 10:04:02

标签: python machine-learning nlp pytorch lstm

我正在尝试使用PyTorch训练LSTM nn,如果我使用对数线性模型使用epoch = True附加的相同功能(遍历数据),则学习效果很好,并给出令人满意的结果,但是当我使用他根本不学习的LSTM网络。  这是我第一次尝试pyTorch,所以很抱歉,如果代码组织得不好。 附加的训练循环以64批为单位进行,在他累积了64个示例之后,他会将其发送到模型中进行评估,然后计算损失并执行权重更新。

class LSTM(nn.Module):
    """
    An LSTM for sentiment analysis with architecture as described in the exercise description.
    """
    def __init__(self, embedding_dim, hidden_dim, num_layers, dropout):
        super(LSTM,self).__init__()
        self.lstm=nn.LSTM(embedding_dim,hidden_dim,num_layers,bidirectional=True)
        self.linear=nn.Linear(hidden_dim*2,1)

        return

    def forward(self, text):
        h_foward_backward=self.lstm(text.double())[1][0]
        h_foward_backward=h_foward_backward.view(64,200)
        final_output=self.linear(h_foward_backward)
        return  torch.sigmoid(final_output)


    def predict(self, text,epoch):
        return (self.forward(text,epoch))


def traverse_data(model: LogLinear, senteces,lookUpDict:dict, optimizer,  criterion,batch_size=64,vec_dim=300,
                  epoch=False
                  ,word2vec=False,LSTM=False):
    if epoch:
        model.train()

    output_preds = []
    y_pred_list = []
    model.double()
    batch_counter = 1
    total_loss_for_patch = 0
    batch=np.zeros((64,vec_dim))

    if LSTM:
        batch=np.zeros((52,64,300))
    batch_values=np.zeros((64,1))
    for sent in senteces:
        if LSTM:
            x_data=sentence_to_embedding(sent.text,lookUpDict,52)


        elif word2vec:
            x_data=get_w2v_average(sent.text,lookUpDict,W2V_EMBEDDING_DIM)

        if not word2vec and not LSTM:
            x_data = average_one_hots(sent.text, lookUpDict)
        if LSTM:
            batch[:,(batch_counter-1)%batch_size,:]=x_data
        else:
            batch[(batch_counter-1)%batch_size,:]=x_data
        batch_values[(batch_counter-1)%batch_size]=sent.sentiment_val

        if batch_counter%batch_size==0:
            if epoch:
                model.train()
            else:
                model.eval()
            y_pred=model(torch.tensor(batch,dtype=torch.double).double())
            loss=criterion(y_pred,torch.tensor(batch_values,dtype=torch.double).double())
            total_loss_for_patch+=loss
            y_pred_list.extend(y_pred)
            output_preds.extend(batch_values)
            batch=np.zeros((64,vec_dim))
            if epoch:

                loss.backward()
                optimizer.step()
                optimizer.zero_grad()


            if LSTM:
                batch=np.zeros((52,64,300))

            batch_values=np.zeros((64,1))


        batch_counter+=1



    return binary_accuracy(y_pred_list, output_preds), total_loss_for_patch/((len(senteces)/batch_size))

当前的任务需要我们建立一个LSTM双向模型,然后我们从双向LSTM的两端获取输出h1,h2,将它们连接成一个暗200的向量,并使用线性层接收一个我们随后会产生S形的值以获得正确的估计。我已经尝试了一天,以了解问题所在,    我附上了10个世代的traninig结果:

训练迭代acc:0.4384111183637946
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.440246953872933
评估迭代acc:0.4875
训练迭代acc:0.4404373368146214
评估迭代acc:0.4875
训练迭代acc:0.44050533072236725
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875

另一方面,当我对线性日志运行相同的学习循环时,得到以下图形: enter image description here

在此先感谢您提供帮助的任何人。

0 个答案:

没有答案