Pytorch中自定义LSTM模型的输出尺寸

时间:2019-11-13 05:28:19

标签: deep-learning pytorch lstm

我在PyTorch中有一个自定义的LSTM模型,如下所示:

hidden_size = 32  
num_layers = 1
num_classes = 2

class customModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(customModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.fcl = nn.Linear(hidden_size*2, num_classes)

    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)

        # Forward propagate LSTM
        out, hidden = self.bilstm(x, (h0, c0)) 
        fw_bilstm = out[-1, :, :self.hidden_size]
        bk_bilstm = out[0, :, :self.hidden_size]
        concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1)
        fc = self.fcl(concat_fw_bw)
        x = F.softmax(F.relu(fc))
        return x

我可以将类型为torch.Tensor的输入传递给该模型。输入的长度为67349,每个都是一个300维向量。

在模型初始化和预测之后,我得到了长度为1的输出向量。

model = customModel(300, hidden_size, num_layers, num_classes)
output = model(input_torch)

当我打印输出时,输出显示tensor([[0.5020, 0.4980]], grad_fn=<SoftmaxBackward>)

为什么此输出的长度为1?看来我不应该在模型中使用barch_first=True,但更改需要其他输入尺寸更改,而我不确定该怎么做。

请建议如何获得长度为67349(输入长度​​)而不是1的向量输出?

说明

我看到@gorjan建议对网络的forward方法进行一些修改。所以我想澄清更多我要建立的东西

  1. 将嵌入内容提交到BiLSTM(完成)
  2. 获取每个方向上最后一步的隐藏状态并进行连接
  3. 将连接的输出(来自步骤2)馈送到完全连接的层 带有ReLUs
  4. 将第3步的输出馈送到softmax层

1 个答案:

答案 0 :(得分:0)

我已经评论了您模块的def forward(...)方法,看看:

def forward(self, x):
    # Set initial hidden and cell states 
    h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
    c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)

    # Forward propagate LSTM
    out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
    fw_bilstm = out[-1, :, :self.hidden_size] # This is wrong: You are taking only last batch element
    bk_bilstm = out[0, :, :self.hidden_size] # This is wrong: You are taking only the first batch element
    concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1) # This is not needed: If you want to obtain the hidden states for all elements in the sequence
    fc = self.fcl(concat_fw_bw) # Because of the above mentioned issues, this is wrong as well.
    x = F.softmax(F.relu(fc)) # This is wrong: Never stack activation on top of activation.
    return x

现在,根据您的要求:

  

请建议如何获得长度为67349(输入长度​​)而不是1的向量输出?

我想您想获取批处理中每个元素的隐藏状态。这是您应该如何构造前向通行证的方法:

def forward(self, x):
    # Set initial hidden and cell states 
    h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
    c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)

    # Forward propagate LSTM
    out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
    fc = self.fcl(out) # fc is of size [batch_size, sequence_length, num_classes]
    x = F.softmax(fc) # Just softmax so that you can get the probabilities for each of your classes
    return x

如果我们测试更新的模型,结果如下:

# Assuming 32 elements in the batch, each elements has 177 elements in the sequence, and each sequence element has size 300
inputs = torch.rand(32, 177, 300)
# Obtaining the outputs from the model
outputs = model(inputs)
# The size is as expected: torch.Size([32, 177, 2])
print(outputs.shape)

您要记住的另一件事是:

  

输入的长度为67349,每个输入均为300维矢量。

这是一个非常长的序列。您的模型将大大落后于我,我想您的培训将永远持续下去。但是,这是完全不同的问题,应该在单独的线程中进行讨论。