我正在尝试使用PyTorch训练LSTM nn,如果我使用对数线性模型使用epoch = True附加的相同功能(遍历数据),则学习效果很好,并给出令人满意的结果,但是当我使用他根本不学习的LSTM网络。 这是我第一次尝试pyTorch,所以很抱歉,如果代码组织得不好。 附加的训练循环以64批为单位进行,在他累积了64个示例之后,他会将其发送到模型中进行评估,然后计算损失并执行权重更新。
class LSTM(nn.Module):
"""
An LSTM for sentiment analysis with architecture as described in the exercise description.
"""
def __init__(self, embedding_dim, hidden_dim, num_layers, dropout):
super(LSTM,self).__init__()
self.lstm=nn.LSTM(embedding_dim,hidden_dim,num_layers,bidirectional=True)
self.linear=nn.Linear(hidden_dim*2,1)
return
def forward(self, text):
h_foward_backward=self.lstm(text.double())[1][0]
h_foward_backward=h_foward_backward.view(64,200)
final_output=self.linear(h_foward_backward)
return torch.sigmoid(final_output)
def predict(self, text,epoch):
return (self.forward(text,epoch))
def traverse_data(model: LogLinear, senteces,lookUpDict:dict, optimizer, criterion,batch_size=64,vec_dim=300,
epoch=False
,word2vec=False,LSTM=False):
if epoch:
model.train()
output_preds = []
y_pred_list = []
model.double()
batch_counter = 1
total_loss_for_patch = 0
batch=np.zeros((64,vec_dim))
if LSTM:
batch=np.zeros((52,64,300))
batch_values=np.zeros((64,1))
for sent in senteces:
if LSTM:
x_data=sentence_to_embedding(sent.text,lookUpDict,52)
elif word2vec:
x_data=get_w2v_average(sent.text,lookUpDict,W2V_EMBEDDING_DIM)
if not word2vec and not LSTM:
x_data = average_one_hots(sent.text, lookUpDict)
if LSTM:
batch[:,(batch_counter-1)%batch_size,:]=x_data
else:
batch[(batch_counter-1)%batch_size,:]=x_data
batch_values[(batch_counter-1)%batch_size]=sent.sentiment_val
if batch_counter%batch_size==0:
if epoch:
model.train()
else:
model.eval()
y_pred=model(torch.tensor(batch,dtype=torch.double).double())
loss=criterion(y_pred,torch.tensor(batch_values,dtype=torch.double).double())
total_loss_for_patch+=loss
y_pred_list.extend(y_pred)
output_preds.extend(batch_values)
batch=np.zeros((64,vec_dim))
if epoch:
loss.backward()
optimizer.step()
optimizer.zero_grad()
if LSTM:
batch=np.zeros((52,64,300))
batch_values=np.zeros((64,1))
batch_counter+=1
return binary_accuracy(y_pred_list, output_preds), total_loss_for_patch/((len(senteces)/batch_size))
当前的任务需要我们建立一个LSTM双向模型,然后我们从双向LSTM的两端获取输出h1,h2,将它们连接成一个暗200的向量,并使用线性层接收一个我们随后会产生S形的值以获得正确的估计。我已经尝试了一天,以了解问题所在, 我附上了10个世代的traninig结果:
训练迭代acc:0.4384111183637946
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.440246953872933
评估迭代acc:0.4875
训练迭代acc:0.4404373368146214
评估迭代acc:0.4875
训练迭代acc:0.44050533072236725
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
训练迭代acc:0.4408588990426458
评估迭代acc:0.4875
另一方面,当我对线性日志运行相同的学习循环时,得到以下图形:
在此先感谢您提供帮助的任何人。