Question

我正在尝试使用Keras和Tensorflow后端训练LSTM，但似乎总是不适合;损失和验证损失曲线有一个初始下降，然后很快变平（see image）。我已经尝试添加更多层，更多神经元，没有辍学等，但是即使在过度装备附近也无法获得它，我确实有很多数据（几乎4小时，每秒100个样本，我尝试下采样到50 /秒）。

我的问题是具有连续值的多维时间序列预测。任何想法，将不胜感激！这是我的基本keras架构：

data_dim = 30 #input dimensions => each timestep has 30 features
timesteps = 200
out_dim = 30 #output dimensions => each predicted output timestep 
             # has 30 dimensions
batch_size = 50
num_epochs = 300
learning_rate = 0.0005 #tried values between around 0.001 and 0.0003 
decay=0.9

#hidden layers size
h1 = 120
h2 = 340
h3 = 340
h4 = 120

model = Sequential()
model.add(LSTM(h1, return_sequences=True,input_shape=(timesteps, data_dim)))
model.add(LSTM(h2, return_sequences=True))
model.add(LSTM(h3, return_sequences=True))
model.add(LSTM(h4, return_sequences=True))
model.add(Dense(out_dim, activation='linear'))

rmsprop_otim = keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-08, decay=decay)
model.compile(loss='mean_squared_error', optimizer=rmsprop_otim,metrics=['mse'])

#data preparation
[x_train, y_train] = readData()
x_train = x_train.reshape((int(num_samples/timesteps),timesteps,data_dim))
y_train = y_train.reshape((int(num_samples/timesteps),timesteps,num_classes))

history_callback = model.fit(x_train, y_train, validation_split=0.1,
      batch_size=batch_size, epochs=num_epochs,shuffle=False,callbacks=[checkpointer, losses])

Answer 1

当你说0.06 mse不足时，这很大程度上取决于数据分布。 mse是相对术语，因此如果数据未标准化，则0.06甚至可能过度拟合。在这种情况下，预处理可能会有所帮助。另外，检查数据中是否存在明显的噪音。
使用4个大尺寸的LSTM层意味着需要学习很多参数。层数较少可能就足够了。
在最后一层尝试非线性激活。

Answer 2

我怀疑你的模型只能正确地学习密集层的权重，而不是下面的LSTM层的权重。作为快速检查，当您摆脱所有LSTM图层并将Dense替换为TimeDistributed(Dense...)图层时，您会获得什么样的性能？如果您的图表看起来相同，则训练不起作用，即关于较低层权重的误差梯度可能太小。另一种检查方法是直接检查梯度和/或比较训练后的最终重量与初始重量。如果这确实是问题，你可以尝试以下1）标准化您的输入，2）使用较小的学习率（以对数方式减少），3）添加跳过层连接，4）使用Adam而不是RMSprop，以及5）训练更多时期。

Keras LSTM总是不合适

2 个答案: