我正在尝试通过二进制功能对序列进行分类。我有一个序列/标签对的数据集,并且正在使用一个简单的单层LSTM对每个序列进行分类。在实施迷你批处理之前,我在测试集上获得了合理的准确性(80%),并且训练损失将从0.6降低到0.3(平均)。
我使用本教程的部分内容实现了迷你批处理:https://pytorch.org/tutorials/beginner/chatbot_tutorial.html
但是,现在我的模型在批次大小设置为1且所有其他参数完全相同的情况下,效果不会超过70-72%(数据的70%具有一个标签)。此外,损失始于0.0106,并且很快变得很小,结果没有明显变化。我觉得没有批处理和大小为1的批处理之间的结果应该是相同的,所以我可能有一个错误,但是我一生中都找不到它。我的代码在下面。
培训代码(一个纪元):
for i in t:
model.zero_grad()
# prep inputs
last = i+self.params['batch_size']
last = last if last < len(train_data) else len(train_data)
batch_in, lengths, batch_targets = self.batch2TrainData(train_data[shuffled][i:last], word_to_ix, label_to_ix)
iters += 1
# forward pass.
tag_scores = model(batch_in, lengths)
# compute loss, then do backward pass, then update gradients
loss = loss_function(tag_scores, batch_targets)
loss.backward()
# Clip gradients: gradients are modified in place
nn.utils.clip_grad_norm_(model.parameters(), 50.0)
optimizer.step()
功能:
def prep_sequence(self, seq, to_ix):
idxs = [to_ix[w] for w in seq]
return torch.tensor(idxs, dtype=torch.long)
# transposes batch_in
def zeroPadding(self, l, fillvalue=0):
return list(itertools.zip_longest(*l, fillvalue=fillvalue))
# Returns padded input sequence tensor and lengths
def inputVar(self, batch_in, word_to_ix):
idx_batch = [self.prep_sequence(seq, word_to_ix) for seq in batch_in]
lengths = torch.tensor([len(idxs) for idxs in idx_batch])
padList = self.zeroPadding(idx_batch)
padVar = torch.LongTensor(padList)
return padVar, lengths
# Returns all items for a given batch of pairs
def batch2TrainData(self, batch, word_to_ix, label_to_ix):
# sort by dec length
batch = batch[np.argsort([len(x['turn']) for x in batch])[::-1]]
input_batch, output_batch = [], []
for pair in batch:
input_batch.append(pair['turn'])
output_batch.append(pair['label'])
inp, lengths = self.inputVar(input_batch, word_to_ix)
output = self.prep_sequence(output_batch, label_to_ix)
return inp, lengths, output
型号:
class LSTMClassifier(nn.Module):
def __init__(self, params, vocab_size, tagset_size, weights_matrix=None):
super(LSTMClassifier, self).__init__()
self.hidden_dim = params['hidden_dim']
if weights_matrix is not None:
self.word_embeddings = nn.Embedding.from_pretrained(weights_matrix)
else:
self.word_embeddings = nn.Embedding(vocab_size, params['embedding_dim'])
self.lstm = nn.LSTM(params['embedding_dim'], self.hidden_dim, bidirectional=False)
# The linear layer that maps from hidden state space to tag space
self.hidden2tag = nn.Linear(self.hidden_dim, tagset_size)
def forward(self, batch_in, lengths):
embeds = self.word_embeddings(batch_in)
packed = nn.utils.rnn.pack_padded_sequence(embeds, lengths)
lstm_out, _ = self.lstm(packed)
outputs, _ = nn.utils.rnn.pad_packed_sequence(lstm_out)
tag_space = self.hidden2tag(outputs)
tag_scores = F.log_softmax(tag_space, dim=0)
return tag_scores[-1]
答案 0 :(得分:0)
对于其他遇到类似问题的人,我可以使用它。我删除了log_softmax计算,因此:
tag_space = self.hidden2tag(outputs)
tag_scores = F.log_softmax(tag_space, dim=0)
return tag_scores[-1]
成为这个:
tag_space = self.hidden2tag(outputs)
return tag_space[-1]
我还将NLLLoss更改为CrossEntropyLoss(上面未显示),并初始化了不带参数(也就是没有ignore_index)的CrossEntropyLoss。
我不确定为什么需要进行这些更改(文档甚至说NLLLoss应该在log_softmax层之后运行),但是它们使我的模型正常工作,并使我的损失回到了合理的范围内(〜0.5)。