我正在尝试掌握LSTM和pytorch。给出了不同长度的序列。序列中的每个数据点均由8个特征组成,并且每个数据点都属于6类(0-5)中的一种。我想学习如何使用LSTM预测那些数据点的标签。
到目前为止,我已经做出了一些尝试来解决这个问题,但是我担心在涉及该主题时会错过一些基本知识。
我首先用-1和标签填充序列到max_amount开始。
我有数据加载器(训练,测试,val),它们负责交付形状为[batch_size,max_seq_length,num_features]的批次。该模型非常简单,由lstm和线性层组成。
LSTM:
...
self.lstm = nn.LSTM(
input_size=self.sequence_dimension, # 8 features
hidden_size=self.hidden_dim,
num_layers=self.n_layers,
batch_first=True,
)
self.linear_fc = nn.Linear(self.hidden_dim, self.output_size)
...
init:
def init_hidden(self):
# the weights are of the form (nb_layers, batch_size, nb_lstm_units)
hidden_a = torch.randn(self.n_layers, self.batch_size, self.hidden_dim)
hidden_b = torch.randn(self.n_layers, self.batch_size, self.hidden_dim)
hidden_a = Variable(hidden_a)
hidden_b = Variable(hidden_b)
return (hidden_a, hidden_b)
转发:
def forward(self, X):
# shape of X: [batch_size, max_seq_len, feature_size]
self.hidden = self.init_hidden()
# get unpadded sequence lenghts (padding: -1)
lenghts = []
for batch in X:
for i, elem in enumerate(batch):
if elem[0] == -1:
lenghts.append(i-1)
break
# pack the padded sequences, length contains unpadded lenghts (eg., [43,46,67,121]
x_packed = torch.nn.utils.rnn.pack_padded_sequence(X, lenghts, batch_first=True, enforce_sorted=False)
lstm_out, self.hidden = self.lstm(x_packed.float(), self.hidden)
# unpack
x_unpacked, seq_len = torch.nn.utils.rnn.pad_packed_sequence(lstm_out, batch_first=True)
#squash the batches from [batch_size, max_seq_len_of_this_batch, hidden_dim]
#to [batch_size*max_seq_len_of_this_batch, hidden dim]
batches_squashed = x_unpacked.contiguous().view(-1, x_unpacked.shape[2])
#feed it to linear
y_pred = self.linear_fc(batches_squashed)
#unsquash batches to [batch_size, max_seq_len_of_this_batch, hidden_dim]
y_pred = y_pred.contiguous().view((batch_size, max(lenghts),self.output_size))
return y_pred, max(lenghts)
我不确定我是否正确执行此操作,尤其是对于线性图层。它接收张量为[batchsize * max_seq_len,hidden_layers]的张量,对吗?那里仍然有填充值...
在火车循环中,我只是为每个时期初始化隐藏状态,然后从数据加载器中循环(输入,标签):
loss_funtion = nn.CrossEntropyLoss(ignore_index=-1)
optimizer = torch.optim.SGD(net.parameters(), lr=lr, momentum=0.9)
for inputs, labels in train_loader:
counter+=1
net.zero_grad()
if train_on_gpu:
inputs = inputs.to(device)
labels = labels.to(device)
output, _ = net(inputs)
#remove excessive padding for labels
labels = decreasing_padding(labels, max_padding_for_this_batch)
#transposing the output to fit the crossEntropy definiton
loss = loss_funtion(output.transpose(1, 2),labels.long())
loss.backward()
optimizer.step()
LSTM通常只为整个序列输出单个标签(存在正确的序列)。训练和验证集上的损失和准确性表明它没有学习。虽然损失略有减少,但波动很大。
我怀疑线性层实现不正确。
如果有人能指出我的错误和误解,我将不胜感激,到目前为止,我已经花了很多时间。