我在PyTorch中有一个自定义的LSTM模型,如下所示:
hidden_size = 32
num_layers = 1
num_classes = 2
class customModel(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(customModel, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
self.fcl = nn.Linear(hidden_size*2, num_classes)
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, hidden = self.bilstm(x, (h0, c0))
fw_bilstm = out[-1, :, :self.hidden_size]
bk_bilstm = out[0, :, :self.hidden_size]
concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1)
fc = self.fcl(concat_fw_bw)
x = F.softmax(F.relu(fc))
return x
我可以将类型为torch.Tensor
的输入传递给该模型。输入的长度为67349
,每个都是一个300
维向量。
在模型初始化和预测之后,我得到了长度为1
的输出向量。
model = customModel(300, hidden_size, num_layers, num_classes)
output = model(input_torch)
当我打印输出时,输出显示tensor([[0.5020, 0.4980]], grad_fn=<SoftmaxBackward>)
。
为什么此输出的长度为1
?看来我不应该在模型中使用barch_first=True
,但更改需要其他输入尺寸更改,而我不确定该怎么做。
请建议如何获得长度为67349
(输入长度)而不是1
的向量输出?
说明
我看到@gorjan建议对网络的forward
方法进行一些修改。所以我想澄清更多我要建立的东西
答案 0 :(得分:0)
我已经评论了您模块的def forward(...)
方法,看看:
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
fw_bilstm = out[-1, :, :self.hidden_size] # This is wrong: You are taking only last batch element
bk_bilstm = out[0, :, :self.hidden_size] # This is wrong: You are taking only the first batch element
concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1) # This is not needed: If you want to obtain the hidden states for all elements in the sequence
fc = self.fcl(concat_fw_bw) # Because of the above mentioned issues, this is wrong as well.
x = F.softmax(F.relu(fc)) # This is wrong: Never stack activation on top of activation.
return x
现在,根据您的要求:
请建议如何获得长度为67349(输入长度)而不是1的向量输出?
我想您想获取批处理中每个元素的隐藏状态。这是您应该如何构造前向通行证的方法:
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
fc = self.fcl(out) # fc is of size [batch_size, sequence_length, num_classes]
x = F.softmax(fc) # Just softmax so that you can get the probabilities for each of your classes
return x
如果我们测试更新的模型,结果如下:
# Assuming 32 elements in the batch, each elements has 177 elements in the sequence, and each sequence element has size 300
inputs = torch.rand(32, 177, 300)
# Obtaining the outputs from the model
outputs = model(inputs)
# The size is as expected: torch.Size([32, 177, 2])
print(outputs.shape)
您要记住的另一件事是:
输入的长度为67349,每个输入均为300维矢量。
这是一个非常长的序列。您的模型将大大落后于我,我想您的培训将永远持续下去。但是,这是完全不同的问题,应该在单独的线程中进行讨论。