尝试将Keras中的简单LSTM模型转换为PyTorch代码。 Keras模型仅在200个纪元后就会收敛,而PyTorch模型则是:
这是Keras代码:
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
X = array([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).reshape((6,3,1))
y = array([40,50,60,70,80,90])
model = Sequential()
model.add(LSTM(50, activation='relu', recurrent_activation='sigmoid', input_shape=(3, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=1)
x_input = array([70, 80, 90]).reshape((1, 3, 1))
yhat = model.predict(x_input, verbose=0)
print(yhat)
这是等效的PyTorch代码:
from numpy import array
import torch
import torch.nn as nn
import torch.nn.functional as F
X = torch.tensor([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).float().reshape(6,3,1)
y = torch.tensor([40,50,60,70,80,90]).float().reshape(6,1)
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lstm = nn.LSTM(input_size=1, hidden_size=50, num_layers=1, batch_first=True)
self.fc = nn.Linear(50, 1)
def forward(self, x):
batches = x.size(0)
h0 = torch.zeros([1, batches, 50])
c0 = torch.zeros([1, batches, 50])
(x, _) = self.lstm(x, (h0, c0))
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
x = F.relu(x)
x = self.fc(x)
return x
model = Model()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
n_epochs = 8000
for epoch in range(n_epochs):
model.train()
optimizer.zero_grad()
y_ = model(X)
loss = criterion(y_, y)
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}/{n_epochs}, loss = {loss.item()}")
model.eval()
x_input = torch.tensor([70, 80, 90]).float().reshape((1, 3, 1))
yhat = model(x_input)
print(yhat)
唯一可能的区别是初始权重和偏差值,但我认为权重和偏差略有不同可能不会造成行为上的巨大差异。 我在PyTorch代码中缺少什么?
答案 0 :(得分:1)
行为上的差异是由于LSTM API中的激活功能所致。通过将激活更改为tanh,我也可以在Keras中重现该问题。
model.add(LSTM(50,Activation = 'tanh',recurrent_activation ='Sigmoid',input_shape =(3,1)))
在pytorch LSTM API中没有选项可以将激活功能更改为“ relu”。 https://pytorch.org/docs/stable/nn.html#lstm
从这里开始执行LSTM,https://github.com/huggingface/torchMoji/blob/master/torchmoji/lstm.py 然后将hardsigmoid / tanh更改为Sigmoid / relu,该模型也会在火炬中收敛。
答案 1 :(得分:0)
我认为您每次都需要初始化h0,c0。因此,最好使用下面我修改过的代码。您可以通过pytorch中的RNN链接:https://pytorch.org/docs/stable/nn.html?highlight=rnn#torch.nn.RNN
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.rnn = nn.RNN(input_size=1, hidden_size=50, num_layers=1, nonlinearity="relu", batch_first=True)
self.fc = nn.Linear(50, 1)
def forward(self, x):
# batches = x.size(0)
# h0 = torch.zeros([1, batches, 50])
# c0 = torch.zeros([1, batches, 50])
# (x, _) = self.lstm(x, (h0, c0))
(x, _) = self.rnn(x)
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
x = F.relu(x)
x = self.fc(x)
return x
在2500个周期内给出了良好的预测结果。 我想知道为什么要在下面的代码行中编写代码,以及它的目的是什么。因此,我可以尝试使其更好。
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)