Question

我很难训练我的LSTM模型，它似乎根本没学。对于非常简单的模型（1层，很少的lstm单位），训练损失几乎不会减少，而精度却会发生变化，但最终会卡在45％，就像从一开始就使用更复杂的模型一样。在大多数情况下，它只预测一个类作为输出。我已经尝试过更改所有的超级参数，但是它似乎并没有真正改变任何东西，因此恐怕我错过了一些过时的东西。

这是我的模特：

Embedding_Model(nn.Module):
    def __init__(self, dim, vocab_size, classes, lstm_units=100, num_layers=3, bidirectional=True):
        super(Embedding_Model, self).__init__()

        self.word_embeddings = nn.Embedding(vocab_size, dim)
        self.lstm =  nn.LSTM(dim, lstm_units, num_layers=num_layers, bidirectional=bidirectional, batch_first=True)
        self.fc = nn.Sequential(
            nn.Linear(lstm_units*(1+int(bidirectional)), classes),
            nn.Softmax(dim=1)
        )

    def forward(self, x):
        x = self.word_embeddings(x)
        x, (h,c)= self.lstm(x)
        x = x.transpose(0,1)
        x = self.fc(x[-1])
        return x

我的训练集包含大约5000个输入序列。输入序列的长度为1400（用零填充）。有150,000个不同的令牌，我尝试将尺寸嵌入10到200之间。我使用了3种输出类别（在训练集中相当平衡）和交叉熵损失。

我弄乱了什么明显的东西吗？我知道训练集很小，但是至少我会期望一些过拟合。但是该模型似乎根本没有学习任何东西。

Answer 1

您说的是使用nn.CrossEntropyLoss作为损失函数，该函数应用log-softmax，但同时也在模型中应用nn.Softmax。

nn.CrossEntropyLoss需要原始logit，因此您需要从模型中删除nn.Softmax。

self.fc = nn.Linear(lstm_units*(1+int(bidirectional)), classes)

Pytorch：LSTM不学习

1 个答案: