简单的Keras LSTM模型的PyTorch版本

时间:2019-02-21 20:42:57

标签: keras lstm pytorch

尝试将Keras中的简单LSTM模型转换为PyTorch代码。 Keras模型仅在200个纪元后就会收敛,而PyTorch模型则是:

  • 需要更多的时期才能达到相同的损失水平(200对〜8000)
  • 由于预测值不在100附近,因此似乎过度拟合了输入

这是Keras代码:

from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

X = array([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).reshape((6,3,1))
y = array([40,50,60,70,80,90])
model = Sequential()
model.add(LSTM(50, activation='relu', recurrent_activation='sigmoid',  input_shape=(3, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=1)
x_input = array([70, 80, 90]).reshape((1, 3, 1))
yhat = model.predict(x_input, verbose=0)
print(yhat)

这是等效的PyTorch代码:

from numpy import array
import torch
import torch.nn as nn
import torch.nn.functional as F

X = torch.tensor([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).float().reshape(6,3,1)
y = torch.tensor([40,50,60,70,80,90]).float().reshape(6,1)

class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    self.lstm = nn.LSTM(input_size=1, hidden_size=50, num_layers=1, batch_first=True)
    self.fc = nn.Linear(50, 1)

  def forward(self, x):
    batches = x.size(0)
    h0 = torch.zeros([1, batches, 50])
    c0 = torch.zeros([1, batches, 50])
    (x, _) = self.lstm(x, (h0, c0))
    x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
    x = F.relu(x)
    x = self.fc(x)
    return x

model = Model()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())

n_epochs = 8000
for epoch in range(n_epochs):
  model.train()
  optimizer.zero_grad()
  y_ = model(X)
  loss = criterion(y_, y)
  loss.backward()
  optimizer.step()
  print(f"Epoch {epoch+1}/{n_epochs}, loss = {loss.item()}")

model.eval()
x_input = torch.tensor([70, 80, 90]).float().reshape((1, 3, 1))
yhat = model(x_input)
print(yhat)

唯一可能的区别是初始权重和偏差值,但我认为权重和偏差略有不同可能不会造成行为上的巨大差异。 我在PyTorch代码中缺少什么?

2 个答案:

答案 0 :(得分:1)

行为上的差异是由于LSTM API中的激活功能所致。通过将激活更改为tanh,我也可以在Keras中重现该问题。

model.add(LSTM(50,Activation = 'tanh',recurrent_activation ='Sigmoid',input_shape =(3,1)))

在pytorch LSTM API中没有选项可以将激活功能更改为“ relu”。 https://pytorch.org/docs/stable/nn.html#lstm

从这里开始执行LSTM,https://github.com/huggingface/torchMoji/blob/master/torchmoji/lstm.py 然后将hardsigmoid / tanh更改为Sigmoid / relu,该模型也会在火炬中收敛。

答案 1 :(得分:0)

我认为您每次都需要初始化h0,c0。因此,最好使用下面我修改过的代码。您可以通过pytorch中的RNN链接:https://pytorch.org/docs/stable/nn.html?highlight=rnn#torch.nn.RNN

class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    self.rnn = nn.RNN(input_size=1, hidden_size=50, num_layers=1, nonlinearity="relu", batch_first=True)
    self.fc = nn.Linear(50, 1)

  def forward(self, x):
    # batches = x.size(0)
    # h0 = torch.zeros([1, batches, 50])
    # c0 = torch.zeros([1, batches, 50])
    # (x, _) = self.lstm(x, (h0, c0))
    (x, _) = self.rnn(x)
    x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)    
    x = F.relu(x)    
    x = self.fc(x)
    return x

在2500个周期内给出了良好的预测结果。 我想知道为什么要在下面的代码行中编写代码,以及它的目的是什么。因此,我可以尝试使其更好。

x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)