Question

我是PyTorch和LSTM的新手，我正在尝试训练一个分类模型，该模型采用一个句子，其中每个单词都通过word2vec（预训练向量）进行编码，并在看到完整的句子后输出一个类。我有四个不同的班级。句子的长度是可变的。

我的代码正在运行，没有错误，但是无论我训练模型有多少个时期，它总是可以预测相同的类。因此，我认为渐变没有正确地反向传播。这是我的代码：

class LSTM(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, tagset_size):
        super(LSTM, self).__init__()
        self.hidden_dim = hidden_dim
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
        self.hidden = self.init_hidden()

    def init_hidden(self):
        # The axes semantics are (num_layers, minibatch_size, hidden_dim)
        return (torch.zeros(1, 1, self.hidden_dim).to(device),
                torch.zeros(1, 1, self.hidden_dim).to(device))

    def forward(self, sentence):
        lstm_out, self.hidden = self.lstm(sentence.view(len(sentence), 1, -1), self.hidden)
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores

EMBEDDING_DIM = len(training_data[0][0][0])
HIDDEN_DIM = 256

model = LSTM(EMBEDDING_DIM, HIDDEN_DIM, 4)
model.to(device)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in tqdm(range(n_epochs)):
    for sentence, tag in tqdm(training_data):
        model.zero_grad()

        model.hidden = model.init_hidden()

        sentence_in = torch.tensor(sentence, dtype=torch.float).to(device)
        targets = torch.tensor([label_to_idx[tag]], dtype=torch.long).to(device)

        tag_scores = model(sentence_in)

        res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)
        # I THINK THIS IS WRONG???
        print(res)     # tensor([[-10.6328, -10.6783, -10.6667,  -0.0001]], device='cuda:0', grad_fn=<CopyBackwards>)
        print(targets) # tensor([3], device='cuda:0')

        loss = loss_function(res, targets)

        loss.backward()
        optimizer.step()

该代码主要受https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html的启发区别在于它们具有序列到序列模型，而我具有序列到一个模型。

我不确定问题是什么，但是我想模型返回的分数包含每个标签的分数，而我的真实情况仅包含正确类别的索引？如何正确处理？

或者损失函数可能不是我的用例中正确的函数吗？另外，我不确定这样做是否正确：

res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)

我想通过使用tag_scores[-1]来获得分数，因为如果我正确理解的话，tag_scores包含每一步之后的分数。

这就是我的评价方式：

with torch.no_grad():
    preds = []
    gts = []

    for sentence, tag in tqdm(test_data):
        inputs = torch.tensor(sentence, dtype=torch.float).to(device)

        tag_scores = model(inputs)

        # find index with max value (this is the class to be predicted)
        pred = [j for j,v in enumerate(tag_scores[-1]) if v == max(tag_scores[-1])][0]

        print(pred, idx_to_label[pred], tag)
        preds.append(pred)
        gts.append(label_to_idx[tag])

print(f1_score(gts, preds, average='micro'))
print(classification_report(gts, preds))

编辑：

在训练之前对数据进行混排似乎很有效。但是为什么呢？

编辑2 ：

我认为需要改组的原因是我的训练数据包含分组中每个班级的样本。因此，当一个接一个地训练它们时，模型在最近的N次迭代中只会看到相同的类，因此它只会预测该类。另一个原因可能是我目前仅使用一个样本的微型批次，因为我还没有弄清楚如何使用其他大小。

Answer 1

由于您尝试使用整个句子进行分类，因此以下行：

self.hidden2tag(lstm_out.view(len(sentence), -1))

应更改为，以便将最终功能带到分类器。

self.hidden2tag(lstm_out.view(sentence[-1], -1))

但是我也不太确定，因为我不熟悉LSTM。

PyTorch n对1 LSTM不了解任何内容

1 个答案: