Question

我正在尝试掌握LSTM和pytorch。给出了不同长度的序列。序列中的每个数据点均由8个特征组成，并且每个数据点都属于6类（0-5）中的一种。我想学习如何使用LSTM预测那些数据点的标签。

到目前为止，我已经做出了一些尝试来解决这个问题，但是我担心在涉及该主题时会错过一些基本知识。

我首先用-1和标签填充序列到max_amount开始。

我有数据加载器（训练，测试，val），它们负责交付形状为[batch_size，max_seq_length，num_features]的批次。该模型非常简单，由lstm和线性层组成。

LSTM：

...
  self.lstm = nn.LSTM(
            input_size=self.sequence_dimension, # 8 features
            hidden_size=self.hidden_dim,
            num_layers=self.n_layers,
            batch_first=True,
        )
        self.linear_fc = nn.Linear(self.hidden_dim, self.output_size)
...

init：

    def init_hidden(self):
        # the weights are of the form (nb_layers, batch_size, nb_lstm_units)
        hidden_a = torch.randn(self.n_layers, self.batch_size, self.hidden_dim)
        hidden_b = torch.randn(self.n_layers, self.batch_size, self.hidden_dim)

        hidden_a = Variable(hidden_a)
        hidden_b = Variable(hidden_b)

        return (hidden_a, hidden_b)

转发：

     def forward(self, X):
        # shape of X: [batch_size, max_seq_len, feature_size]
       self.hidden = self.init_hidden()

        # get unpadded sequence lenghts (padding: -1)
        lenghts = []
        for batch in X:
            for i, elem in enumerate(batch):
                if elem[0] == -1:
                    lenghts.append(i-1)
                    break

        # pack the padded sequences, length contains unpadded lenghts (eg., [43,46,67,121]
        x_packed = torch.nn.utils.rnn.pack_padded_sequence(X, lenghts, batch_first=True, enforce_sorted=False)

        lstm_out, self.hidden = self.lstm(x_packed.float(), self.hidden)

        # unpack
        x_unpacked, seq_len = torch.nn.utils.rnn.pad_packed_sequence(lstm_out, batch_first=True)

        #squash the batches from [batch_size, max_seq_len_of_this_batch, hidden_dim]
        #to [batch_size*max_seq_len_of_this_batch, hidden dim]
        batches_squashed = x_unpacked.contiguous().view(-1, x_unpacked.shape[2])

        #feed it to linear
        y_pred = self.linear_fc(batches_squashed)

        #unsquash batches to [batch_size, max_seq_len_of_this_batch, hidden_dim]
        y_pred = y_pred.contiguous().view((batch_size, max(lenghts),self.output_size))
        return y_pred, max(lenghts)

我不确定我是否正确执行此操作，尤其是对于线性图层。它接收张量为[batchsize * max_seq_len，hidden_layers]的张量，对吗？那里仍然有填充值...

在火车循环中，我只是为每个时期初始化隐藏状态，然后从数据加载器中循环（输入，标签）：

    loss_funtion = nn.CrossEntropyLoss(ignore_index=-1)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr, momentum=0.9)

   for inputs, labels in train_loader:
        counter+=1
        net.zero_grad()
        if train_on_gpu:
            inputs = inputs.to(device)
            labels = labels.to(device)

        output, _ = net(inputs)

        #remove excessive padding for labels
        labels = decreasing_padding(labels, max_padding_for_this_batch)

        #transposing the output to fit the crossEntropy definiton
        loss = loss_funtion(output.transpose(1, 2),labels.long())
        loss.backward()
        optimizer.step()

LSTM通常只为整个序列输出单个标签（存在正确的序列）。训练和验证集上的损失和准确性表明它没有学习。虽然损失略有减少，但波动很大。

我怀疑线性层实现不正确。

我是否应该一层一层地传递每个隐藏状态（没有填充？）并堆叠其输出？

如果有人能指出我的错误和误解，我将不胜感激，到目前为止，我已经花了很多时间。

使用pytorch lstm了解序列标记中的线性层

0 个答案: