Question

我正在尝试训练模型以对答案是否能回答使用此dataset给出的问题进行分类。

我正在分批训练并使用GloVe词嵌入。除最后一个训练外，我分1000批次训练。我尝试使用的方法是先给LSTM给出第一个句子（问题），然后给第二个句子（答案），然后使用Sigmoid函数让它给我一个介于0和1之间的数字。

问题是，损失总是在时期1之后重复发生。它永远不会收敛到正确的结果，即答案是否属于问题，否则为0。

我的代码如下：

class QandA(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(QandA, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = 1
        self.bidirectional = True

        self.lstm = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
        self.lstm.to(device)
        self.hidden2class = nn.Linear(self.hidden_size * 2, 1)
        self.hidden2class.to(device)

    def forward(self, glove_vec, glove_vec2):
        # glove_vec.shape = (sentence_len, batch_size, 300)
        output, hidden = self.lstm(glove_vec)
        output, _ = self.lstm(glove_vec2, hidden)
        # output.shape = (sentence_len, batch_size, hidden_size * 2)
        output = self.hidden2class(output[-1,:,:])
        # output.shape = (batch_size, 1)
        return F.sigmoid(output)
model = QandA(300, 60).to(device)
loss_function = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

我的方法太错误了，以至于在实践中行不通吗？还是还有其他我要监督的问题？

编辑：有关培训的额外代码；

batch_size = 1000
# load_dataset loads the data from the file.
questions, answers, outputs = load_dataset()
N = len(outputs)
losses = []
for epoch in range(10):
    for batch in range(math.ceil(N / batch_size)):
        model.zero_grad()

        # get_data gets the data from the dataset (size batch_size, sequence batch)
        input1, input2, targets = get_data(batch, batch_size)

        class_pred = model(input1, input2)
        loss = loss_function(class_pred, targets)
        loss.backward()
        optimizer.step()

Answer 1

I would suggest to encode question and answer independently and put a classifier on top of it. For example, you can encode with biLSTM question and answer, concatenate their representations and feed to the classifier. The code could be something like this (not tested, but hope you got the idea):

class QandA(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(QandA, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = 1
        self.bidirectional = True

        self.lstm_question = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
        self.lstm_question.to(device)
        self.lstm_answer = nn.LSTM(input_size, self.hidden_size, num_layers = self.num_layers, bidirectional = self.bidirectional)
        self.lstm_answer.to(device)
        self.fc = nn.Linear(self.hidden_size * 4, 1)
        self.fc.to(device)

    def forward(self, glove_question, glove_answer):
        # glove.shape = (sentence_len, batch_size, 300)
        question_last_hidden, _ = self.lstm_question(glove_question)
        # question_last_hidden.shape = (question_len, batch_size, hidden_size * 2)
        answer_last_hidden, _ = self.lstm_answer(glove_answer)
        # answer_last_hidden.shape = (answer_len, batch_size, hidden_size * 2)

        # flatten output of the lstm, if you have multiple lstm layers you need to take only the last layers backward/forward hidden states
        question_last_hidden = question_last_hidden[-1,:,:]
        answer_last_hidden = answer_last_hidden[-1,:,:]
        representation = torch.cat([question_last_hidden, answer_last_hidden], -1) # check here to concatenate over feature size
        # representation.shape = (hidden_size * 4, batch_size)
        output = self.fc(representation)
        # output.shape = (batch_size, 1)
        return F.sigmoid(output)

Pytorch LSTM-问答分类培训

1 个答案: