使用pytorch lstm了解序列标记中的线性层

时间:2019-07-07 20:44:07

标签: python machine-learning pytorch lstm

我正在尝试掌握LSTM和pytorch。给出了不同长度的序列。序列中的每个数据点均由8个特征组成,并且每个数据点都属于6类(0-5)中的一种。我想学习如何使用LSTM预测那些数据点的标签。

到目前为止,我已经做出了一些尝试来解决这个问题,但是我担心在涉及该主题时会错过一些基本知识。

我首先用-1和标签填充序列到max_amount开始。

我有数据加载器(训练,测试,val),它们负责交付形状为[batch_size,max_seq_length,num_features]的批次。该模型非常简单,由lstm和线性层组成。

LSTM:

...
  self.lstm = nn.LSTM(
            input_size=self.sequence_dimension, # 8 features
            hidden_size=self.hidden_dim,
            num_layers=self.n_layers,
            batch_first=True,
        )
        self.linear_fc = nn.Linear(self.hidden_dim, self.output_size)
...

init:

    def init_hidden(self):
        # the weights are of the form (nb_layers, batch_size, nb_lstm_units)
        hidden_a = torch.randn(self.n_layers, self.batch_size, self.hidden_dim)
        hidden_b = torch.randn(self.n_layers, self.batch_size, self.hidden_dim)

        hidden_a = Variable(hidden_a)
        hidden_b = Variable(hidden_b)

        return (hidden_a, hidden_b)

转发:

     def forward(self, X):
        # shape of X: [batch_size, max_seq_len, feature_size]
       self.hidden = self.init_hidden()

        # get unpadded sequence lenghts (padding: -1)
        lenghts = []
        for batch in X:
            for i, elem in enumerate(batch):
                if elem[0] == -1:
                    lenghts.append(i-1)
                    break

        # pack the padded sequences, length contains unpadded lenghts (eg., [43,46,67,121]
        x_packed = torch.nn.utils.rnn.pack_padded_sequence(X, lenghts, batch_first=True, enforce_sorted=False)

        lstm_out, self.hidden = self.lstm(x_packed.float(), self.hidden)

        # unpack
        x_unpacked, seq_len = torch.nn.utils.rnn.pad_packed_sequence(lstm_out, batch_first=True)

        #squash the batches from [batch_size, max_seq_len_of_this_batch, hidden_dim]
        #to [batch_size*max_seq_len_of_this_batch, hidden dim]
        batches_squashed = x_unpacked.contiguous().view(-1, x_unpacked.shape[2])

        #feed it to linear
        y_pred = self.linear_fc(batches_squashed)

        #unsquash batches to [batch_size, max_seq_len_of_this_batch, hidden_dim]
        y_pred = y_pred.contiguous().view((batch_size, max(lenghts),self.output_size))
        return y_pred, max(lenghts)

我不确定我是否正确执行此操作,尤其是对于线性图层。它接收张量为[batchsize * max_seq_len,hidden_​​layers]的张量,对吗?那里仍然有填充值...

在火车循环中,我只是为每个时期初始化隐藏状态,然后从数据加载器中循环(输入,标签):

    loss_funtion = nn.CrossEntropyLoss(ignore_index=-1)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr, momentum=0.9)
   for inputs, labels in train_loader:
        counter+=1
        net.zero_grad()
        if train_on_gpu:
            inputs = inputs.to(device)
            labels = labels.to(device)

        output, _ = net(inputs)

        #remove excessive padding for labels
        labels = decreasing_padding(labels, max_padding_for_this_batch)

        #transposing the output to fit the crossEntropy definiton
        loss = loss_funtion(output.transpose(1, 2),labels.long())
        loss.backward()
        optimizer.step()

LSTM通常只为整个序列输出单个标签(存在正确的序列)。训练和验证集上的损失和准确性表明它没有学习。虽然损失略有减少,但波动很大。

我怀疑线性层实现不正确。

  • 我是否应该一层一层地传递每个隐藏状态(没有填充?)并堆叠其输出?

如果有人能指出我的错误和误解,我将不胜感激,到目前为止,我已经花了很多时间。

0 个答案:

没有答案