我正在尝试在Keras中实施多步预测LSTM模型。数据的形状是这样的:
X : (5831, 48, 1)
y : (5831, 1, 12)
我要使用的模型是:
power_in = Input(shape=(X.shape[1], X.shape[2]))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563, kernel_initializer=power_lstm_init, return_sequences=True)(power_in)
main_out = TimeDistributed(Dense(12, kernel_initializer=power_lstm_init))(power_lstm)
在尝试像这样训练模型时:
hist = forecaster.fit([X], y, epochs=325, batch_size=16, validation_data=([X_valid], y_valid), verbose=1, shuffle=False)
我遇到以下错误:
ValueError: Error when checking target: expected time_distributed_16 to have shape (48, 12) but got array with shape (1, 12)
如何解决此问题?
答案 0 :(得分:1)
根据您的评论:
我拥有的数据像t-48,t-47,t-46,.....,t-1一样,是过去的数据, t + 1,t + 2,......,t + 12作为我要预测的值
您可能根本不需要使用TimeDistributed
层:
首先,只需删除LSTM层的resturn_sequences=True
参数。完成此操作后,LSTM层将以形状为(50,)
的向量对过去的输入时间序列进行编码。现在,您可以将其直接喂入具有12个单位的密集层:
# make sure the labels have are in shape (num_samples, 12)
y = np.reshape(y, (-1, 12))
power_in = Input(shape=(X.shape[1:],))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
main_out = Dense(12, kernel_initializer=power_lstm_init)(power_lstm)
或者,如果您想使用TimeDistributed
层并考虑到输出本身是序列,则可以在模型中使用Dense层之前的另一个LSTM层(使用在第一个LSTM层之后添加一个RepeatVector
层,以使其输出的长度为12的时间序列,即与输出时间序列的长度相同):
# make sure the labels have are in shape (num_samples, 12, 1)
y = np.reshape(y, (-1, 12, 1))
power_in = Input(shape=(48,1))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
rep = RepeatVector(12)(power_lstm)
out_lstm = LSTM(32, return_sequences=True)(rep)
main_out = TimeDistributed(Dense(1))(out_lstm)
model = Model(power_in, main_out)
model.summary()
模型摘要:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 48, 1) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 50) 10400
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 12, 50) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 12, 32) 10624
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 1) 33
=================================================================
Total params: 21,057
Trainable params: 21,057
Non-trainable params: 0
_________________________________________________________________
当然,在两个模型中,您可能都需要调整超参数(例如LSTM层的数量,LSTM层的尺寸等),以便能够准确地比较它们并获得良好的结果。
旁注::实际上,在您的情况下,您根本不需要使用TimeDistributed
层,因为(当前)Dense layer is applied on the last axis。因此,TimeDistributed(Dense(...))
和Dense(...)
是等效的。