我很难理解Pytorch中LSTM的内部工作原理。
让我为您展示一个玩具示例。也许该体系结构没有多大意义,但我试图了解LSTM在这种情况下如何工作。
可以从here获取数据。每行i
(总数= 1152)是一个较长时间序列的一个切片,从t = i
到t = i + 91
。我将提取每行的最后一列用作标签。
import torch
import numpy as np
import pandas as pd
from torch import nn, optim
from sklearn.metrics import mean_absolute_error
data = pd.read_csv('data.csv', header = None).values
X = torch.tensor(data[:, :90], dtype = torch.float).view(1152, 1, 90)
y = torch.tensor(data[:, 90], dtype = torch.float).view(1152, 1, 1)
dataset = torch.utils.data.TensorDataset(X, y)
loader = torch.utils.data.DataLoader(dataset, batch_size = 50)
然后我要定义一个LSTM回归器,其中包含三个具有不同结构的LSTM层。
class regressor_LSTM(nn.Module):
def __init__(self):
super().__init__()
self.lstm1 = nn.LSTM(input_size = 49, hidden_size = 100)
self.lstm2 = nn.LSTM(100, 50)
self.lstm3 = nn.LSTM(50, 50, dropout = 0.3, num_layers = 2)
self.dropout = nn.Dropout(p = 0.3)
self.linear = nn.Linear(in_features = 50, out_features = 1)
def forward(self, X):
X, _ = self.lstm1(X)
X = self.dropout(X)
X, _ = self.lstm2(X)
X = self.dropout(X)
X, _ = self.lstm3(X)
X = self.dropout(X)
X = self.linear(X)
return X
初始化需要初始化的内容
regressor = regressor_LSTM()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(regressor.parameters())
然后进行培训:
for epoch in range(25):
acc_loss = 0.
acc_mae = 0.
for i, data in enumerate(loader):
inputs, labels = data
optimizer.zero_grad()
outputs = regressor(inputs)
loss = criterion(outputs, labels)
loss.backward(retain_graph = True)
optimizer.step()
acc_loss += loss.item()
mae = mean_absolute_error(labels.detach().cpu().numpy().flatten(), outputs.detach().cpu().numpy().flatten())
acc_mae += mae
# print('\rEPOCH {:3d} - Loop {:3d} of {:3d}: loss {:03.2f} - MAE {:03.2f}'.format(epoch+1, i+1, len(loader), loss, mae), end = '\r')
print('\nEPOCH %3d FINISHED: loss %.5f - MAE %.5f' % (epoch+1, acc_loss/len(loader), acc_mae/len(loader)))
问题是,在损失和MAE(预期行为)都有一些初始下降之后,两者似乎都陷入了困境(仅在下面显示前10个时期):
EPOCH 1 FINISHED: loss 0.38506 - MAE 0.27322
EPOCH 2 FINISHED: loss 0.02825 - MAE 0.13601
EPOCH 3 FINISHED: loss 0.02593 - MAE 0.13117
EPOCH 4 FINISHED: loss 0.02568 - MAE 0.12705
EPOCH 5 FINISHED: loss 0.02546 - MAE 0.12920
EPOCH 6 FINISHED: loss 0.02502 - MAE 0.12763
EPOCH 7 FINISHED: loss 0.02445 - MAE 0.12659
EPOCH 8 FINISHED: loss 0.02310 - MAE 0.12328
EPOCH 9 FINISHED: loss 0.02277 - MAE 0.12237
EPOCH 10 FINISHED: loss 0.02352 - MAE 0.12476
与Keras一起运行时,这两个指标在整个过程中都会持续下降。 (我还注意到Keras需要更长的时间。)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
import pandas as pd
data = pd.read_csv('data.csv', header = None).values
X = data[:, :90].reshape(1152, 90, 1)
y = data[:, 90]
regressor = Sequential()
regressor.add(LSTM(units = 100, return_sequences = True, input_shape = (90, 1)))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.3))
regressor.add(Dense(units = 1, activation = 'linear'))
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics = ['mean_absolute_error'])
regressor.fit(X, y, epochs = 25, batch_size = 32)
[OUTPUT]
Epoch 1/25
1152/1152 - 35s 30ms/sample - loss: 0.0307 - mean_absolute_error: 0.1225
Epoch 2/25
1152/1152 - 32s 28ms/sample - loss: 0.0156 - mean_absolute_error: 0.0978
Epoch 3/25
1152/1152 - 32s 28ms/sample - loss: 0.0126 - mean_absolute_error: 0.0871
Epoch 4/25
1152/1152 - 34s 30ms/sample - loss: 0.0111 - mean_absolute_error: 0.0806
Epoch 5/25
1152/1152 - 29s 25ms/sample - loss: 0.0103 - mean_absolute_error: 0.0785
Epoch 6/25
1152/1152 - 29s 25ms/sample - loss: 0.0088 - mean_absolute_error: 0.0718
Epoch 7/25
1152/1152 - 32s 27ms/sample - loss: 0.0085 - mean_absolute_error: 0.0699
Epoch 8/25
1152/1152 - 30s 26ms/sample - loss: 0.0069 - mean_absolute_error: 0.0640
Epoch 9/25
1152/1152 - 30s 26ms/sample - loss: 0.0077 - mean_absolute_error: 0.0660
Epoch 10/25
1152/1152 - 30s 26ms/sample - loss: 0.0070 - mean_absolute_error: 0.0644
我一直在阅读有关隐藏状态初始化的信息,我试图在forward方法的开头将它们设置为0(尽管我理解为标准行为),但没有任何帮助。我必须承认,我不了解LSTM的参数是什么,也不应该在每次批处理或每个纪元之后重新初始化(如果有的话)。
感谢您的退货!
答案 0 :(得分:0)
几天后我要回来,因为我得出了结论。阅读了一些有关隐藏/单元状态的资料后,(this one很有用),重新使用它们似乎是网络设计选择的问题。是否这样做以及何时可以算作超参数。我在玩具数据集上尝试了许多选项,主要是在每次批处理后重置状态,在每个时期后重置状态,而根本不进行重置,结果非常相似。同样,我的结果太低了,因为(据我所相信)我没有在加载器中选择shuffle = True
;这样可以使它们好得多(损耗大约为0.003,MAE大约为0.047)。
在LSTM class的原始代码的第510行中,如果未显式传递任何值,则似乎隐藏/单元状态也从零开始初始化。