如何更改Pytorch模型以从数据加载器获取输入

时间:2018-11-05 07:41:37

标签: python nlp pytorch

我正在做一个NLP项目。最初,我逐行将句子输入模型。

我正尝试使用DataLoader一次将多个句子输入模型。我已经在DataSet中实现了此代码:

    def __getitem__(self, i, cuda=use_cuda):
    # encode the sentence at index i
    sentence = self.corpus_original[i]
    sentence_lowered = self.corpus_lowered[i]
    sentence_tags = self.tags[i]

    max_sent_len = self.max_sent_len

    # Idea behind this is that we pad our arrays with zeros and create a new tag and word for those padded indices.
    words = torch.zeros(max_sent_len, dtype=torch.long)
    labels = torch.zeros(max_sent_len, dtype=torch.long)
    chars = torch.zeros(max_sent_len, self.max_word_len,
                        dtype=torch.long)
    word_lengths = np.zeros(max_sent_len, dtype=int)

    for j in range(len(sentence)):
        labels[j] = self.tag2idx[sentence_tags[j]]
        words[j] = self.word2idx[sentence_lowered[j]]
        word_lengths[j] = len(sentence[j])
        for k, c in enumerate(sentence[j]): # for each character
            chars[j,k] = self.char2idx[c]

    # pad here
    # we use -1 to denote padded values
    for j in range(len(sentence), max_sent_len):
        labels[j] = 45 # idx for <pad_tag>
        words[j] = 1
        word_lengths[j] = 5 # 5 because length of "<pad>" is 5
        chars[j, :] = 94 # 94 proper characters, 0 idx so 94 is our idx for <pad_char>

    if cuda:
        words = words.cuda()
        labels = labels.cuda()
        chars = chars.cuda()

    return words, labels, chars, word_lengths

然后我通过以下方式创建train_loader:

train_loader = DataLoader(train, batch_size=32, shuffle=False, num_workers=4)

我生成每个批次并尝试致电

preds = model(words, chars, word_lengths)

毫不奇怪,它告诉我

RuntimeError: Expected 3-dimensional input for 3-dimensional weight [10, 16, 5], but got input of size [16, 54, 141, 16] instead

我的问题是我应该如何更改模型代码,以使其可以接受一批16个句子而不是一个句子?

谢谢。

0 个答案:

没有答案