在小批量训练中,我不理解如何处理LSTM隐藏单元,因为训练数据以 n 个序列的批次发送到网络,而只处理1个序列每次测试期间。
具体地说,我的网络是:
class Pytorch_LSTM(nn.Module):
def __init__(self, params):
super(Pytorch_LSTM, self).__init__()
self.params = params
self.hidden_layer_size = params['hidden_layer_size']
# Define layers
self.lstm = nn.LSTM(input_size = params['in_features'], hidden_size = params['hidden_layer_size'])
self.linear1 = nn.Linear(params['hidden_layer_size'], params['hidden_layer_size'])
self.linear2 = nn.Linear(params['hidden_layer_size'], params['out_features'])
self.hidden_cell = (torch.zeros(1,self.params['batch_size'],self.hidden_layer_size),
torch.zeros(1,self.params['batch_size'],self.hidden_layer_size))
def forward(self, input_seq):
lstm_out, self.hidden_cell = self.lstm(input_seq.view(self.params['time_window'],-1,self.params['in_features']), self.hidden_cell)
linear1_out = self.linear1(lstm_out)
predictions = self.linear2(linear1_out)
return predictions[-1]
在我的train()
方法中:
def train(self, input_sequence, params, test_idx, final, verbose=True):
....
....
# Model
self.model = Pytorch_LSTM(params)
# Let's train the model
for epoch in range(epochs):
for count_1,seq in enumerate(train_data_batch):
optimizer.zero_grad()
self.model.hidden_cell = (torch.zeros(1, params['batch_size'], self.model.hidden_layer_size),
torch.zeros(1, params['batch_size'], self.model.hidden_layer_size))
y_pred = self.model(seq) # seq.shape: (n_batches, 25, 4)
single_loss = mse_loss(y_pred, y_label) # y_pred.shape, y_label.shape : (batch_size, 4)
我相信这是在迷你批次中训练模型。
测试时,每次只能有一个序列,而不是多个批次。在我的test()
中:
for count,seq in enumerate(val_data[j]):
y_pred = self.model(seq) # seq.shape: (25,4)
single_loss = mse_loss(y_pred, y_label)
这将返回错误:
RuntimeError: Expected hidden[0] size (1, 1, 100), got (1, 704, 100)
其中n_batches
=704。
我应该如何处理hidden_cell?
答案 0 :(得分:0)
您将在每次调用形状(1,batch_size,100)时将(h_0, c_0)
参数传递给lstm。 batch_size用于并行处理,并且是任意的,但是每次都在进行硬编码
self.hidden_cell = (torch.zeros(1,self.params['batch_size'],self.hidden_layer_size),
torch.zeros(1,self.params['batch_size'],self.hidden_layer_size))
此hidden_cell
是h_0
和c_0
参数,即隐藏状态和单元状态的初始值。
由于it defaults to zero向量本身具有所需大小,因此不必尝试传递(1,batch_size,100)大小的数组。
只需摆脱self.hidden_cell
并仅通过input_seq
方法将self.lstm
传递给forward
。它应该可以工作