如何在pytorch中使用LSTM进行分类?

时间:2017-12-23 13:32:16

标签: pytorch

我的代码如下:

class Mymodel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, batch_size):
        super(Discriminator, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.batch_size = batch_size

        self.lstm = nn.LSTM(input_size, hidden_size)
        self.proj = nn.Linear(hidden_size, output_size)
        self.hidden = self.init_hidden()


    def init_hidden(self):
        return (Variable(torch.zeros(self.num_layers, self.batch_size, self.hidden_size)),
                Variable(torch.zeros(self.num_layers, self.batch_size, self.hidden_size)))

    def forward(self, x):
        lstm_out, self.hidden = self.lstm(x, self.hidden)
        output = self.proj(lstm_out)
        result = F.sigmoid(output)
        return result

我想使用LSTM将句子分类为好(1)或坏(0)。使用此代码,我得到的结果是time_step * batch_size * 1但不是0或1.如何编辑代码以获得分类结果?

3 个答案:

答案 0 :(得分:6)

理论值:

回想一下,LSTM为系列中的每个输入输出一个向量。您正在使用句子,这些句子是一系列单词(可能转换为索引然后嵌入为向量)。来自LSTM PyTorch tutorial的代码清楚地表明了我的意思(***强调我的):

lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [autograd.Variable(torch.randn((1, 3)))
          for _ in range(5)]  # make a sequence of length 5

# initialize the hidden state.
hidden = (autograd.Variable(torch.randn(1, 1, 3)),
          autograd.Variable(torch.randn((1, 1, 3))))
for i in inputs:
    # Step through the sequence one element at a time.
    # after each step, hidden contains the hidden state.
    out, hidden = lstm(i.view(1, 1, -1), hidden)

# alternatively, we can do the entire sequence all at once.
# the first value returned by LSTM is all of the hidden states throughout
# the sequence. the second is just the most recent hidden state
# *** (compare the last slice of "out" with "hidden" below, they are the same)
# The reason for this is that:
# "out" will give you access to all hidden states in the sequence
# "hidden" will allow you to continue the sequence and backpropagate,
# by passing it as an argument  to the lstm at a later time
# Add the extra 2nd dimension
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (autograd.Variable(torch.randn(1, 1, 3)), autograd.Variable(
torch.randn((1, 1, 3))))  # clean out hidden state
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

再一次:比较最后一片" out"用"隐藏"下面,它们是相同的为什么?嗯...

如果您熟悉LSTM,我建议此时使用PyTorch LSTM docs。在输出部分下,注意每个 t 输出 h_t

现在,如果你还没有习惯LSTM风格的方程式,那么看看Chris Olah的LSTM blog post。向下滚动到展开的网络图表:

Credit C Olah, "Understanding LSTM Networks"

当您逐字(x_i - by - x_i+1)提供句子时,您会从每个时间步获得一个输出。您想要解释整个句子以对其进行分类。所以你必须等到LSTM看到所有单词。也就是说,您需要h_t t,其中self.hidden2label(lstm_out[-1])是您句子中的字数。

代码:

这是编码reference。我不会复制粘贴整个东西,只是相关部分。神奇的事情发生在class LSTMClassifier(nn.Module): def __init__(self, embedding_dim, hidden_dim, vocab_size, label_size, batch_size): ... self.word_embeddings = nn.Embedding(vocab_size, embedding_dim) self.lstm = nn.LSTM(embedding_dim, hidden_dim) self.hidden2label = nn.Linear(hidden_dim, label_size) self.hidden = self.init_hidden() def init_hidden(self): return (autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_dim)), autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_dim))) def forward(self, sentence): embeds = self.word_embeddings(sentence) x = embeds.view(len(sentence), self.batch_size , -1) lstm_out, self.hidden = self.lstm(x, self.hidden) y = self.hidden2label(lstm_out[-1]) log_probs = F.log_softmax(y) return log_probs

public static boolean isEmailValid(String email) {
    final Pattern EMAIL_REGEX = Pattern.compile("[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?", Pattern.CASE_INSENSITIVE);
    return EMAIL_REGEX.matcher(email).matches();
}

答案 1 :(得分:0)

作为最后一层,你必须有一个线性层,无论你想要多少个类,如果你在MNIST中进行数字分类,那么你需要10个。对于你的情况,因为你正在进行是/否(1/0)分类,你有两个标签/类,所以线性图层有两个类。我建议添加一个线性层

nn.Linear(feature_size_from_previous_layer,2)

然后使用交叉熵损失训练模型。

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(net.parameters(),lr = 0.001,动量= 0.9)

答案 2 :(得分:0)

您需要弄清楚的主要问题是,在准备数据时,您应该在哪个昏暗的位置放置批量大小。据我所知,如果你没有在你的nn.LSTM()init函数中设置它,它会自动假设第二个dim是你的批量大小,这与其他DNN框架相比有很大不同。也许你可以试试:

self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)

像这样要求你的模型将你的第一个昏暗视为批量暗淡。