我很难将一个非常简单的LSTM模型从Keras转换为Pytorch。 X
(获取here)对应于90个时间步长的1152个样本,每个时间步长只有一个维度。 y
(here)是所有1152个样本在t = 91时的单个预测。
在Keras中:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
import numpy as np
import pandas as pd
X = pd.read_csv('X.csv', header = None).values
X.shape
y = pd.read_csv('y.csv', header = None).values
y.shape
# From Keras documentation [https://keras.io/layers/recurrent/]:
# Input shape 3D tensor with shape (batch_size, timesteps, input_dim).
X = np.reshape(X, (1152, 90, 1))
regressor = Sequential()
regressor.add(LSTM(units = 100, return_sequences = True, input_shape = (90, 1)))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.3))
regressor.add(Dense(units = 1, activation = 'linear'))
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics = ['mean_absolute_error'])
regressor.fit(X, y, epochs = 10, batch_size = 32)
...将我引向:
# Epoch 10/10
# 1152/1152 [==============================] - 33s 29ms/sample - loss: 0.0068 - mean_absolute_error: 0.0628
然后在Pytorch中
import torch
from torch import nn, optim
from sklearn.metrics import mean_absolute_error
X = pd.read_csv('X.csv', header = None).values
y = pd.read_csv('y.csv', header = None).values
X = torch.tensor(X, dtype = torch.float32)
y = torch.tensor(y, dtype = torch.float32)
dataset = torch.utils.data.TensorDataset(X, y)
loader = torch.utils.data.DataLoader(dataset, batch_size = 32, shuffle = True)
class regressor_LSTM(nn.Module):
def __init__(self):
super().__init__()
self.lstm1 = nn.LSTM(input_size = 1, hidden_size = 100)
self.lstm2 = nn.LSTM(100, 50)
self.lstm3 = nn.LSTM(50, 50, dropout = 0.3, num_layers = 2)
self.dropout = nn.Dropout(p = 0.3)
self.linear = nn.Linear(in_features = 50, out_features = 1)
def forward(self, X):
# From the Pytorch documentation [https://pytorch.org/docs/stable/_modules/torch/nn/modules/rnn.html]:
# **input** of shape `(seq_len, batch, input_size)`
X = X.view(90, 32, 1)
# I am discarding hidden/cell states since in Keras I am using a stateless approach
# [https://keras.io/examples/lstm_stateful/]
X, _ = self.lstm1(X)
X = self.dropout(X)
X, _ = self.lstm2(X)
X = self.dropout(X)
X, _ = self.lstm3(X)
X = self.dropout(X)
X = self.linear(X)
return X
regressor = regressor_LSTM()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(regressor.parameters())
for epoch in range(10):
running_loss = 0.
running_mae = 0.
for i, data in enumerate(loader):
inputs, labels = data
optimizer.zero_grad()
outputs = regressor(inputs)
outputs = outputs[-1].view(*labels.shape)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
mae = mean_absolute_error(labels.detach().cpu().numpy().flatten(), outputs.detach().cpu().numpy().flatten())
running_mae += mae
print('EPOCH %3d: loss %.5f - MAE %.5f' % (epoch+1, running_loss/len(loader), running_mae/len(loader)))
...将我引向:
# EPOCH 10: loss 0.04220 - MAE 0.16762
您会注意到损耗和MAE都大不相同(派托克的损耗高得多)。如果我使用Pytorch的模型预测值,则它们都将返回为常数。
我在做什么错了?
答案 0 :(得分:0)
哦,我相信我取得了长足的进步。在Keras和Pytorch之间,表示y
的方式似乎有所不同。在Keras中,我们应该将其作为表示将来某个时间步长的单个值传递(或者至少针对我要解决的问题)。但是在Pytorch中,y
必须X
移到未来一个时间步长。就像这样:
time_series = [0, 1, 2, 3, 4, 5]
X = [0, 1, 2, 3, 4]
# Keras:
y = [5]
# Pytorch:
y = [1, 2, 3, 4, 5]
这样,Pytorch在计算损耗时会比较时间片中的所有值。我相信Keras会在后台重新安排数据以符合这种方法,因为代码在输入变量时就可以正常工作了。但是在Pytorch中,我仅根据一个值(我正在尝试预测的值)而不是整个序列来估计损失,因此我相信它无法正确地反映时间依赖性。
考虑到这一点,我必须:
EPOCH 100: loss 0.00551 - MAE 0.058435
模型清楚地捕获了模式。
万岁!