Question

以下是我使用Keras构建的RNN：

def RNN_keras(feat_num, timestep_num=100):
    model = Sequential()
    model.add(BatchNormalization(input_shape=(timestep_num, feat_num)))
    model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=512, activation='relu', return_sequences=True))
    model.add(BatchNormalization())  
    model.add(LSTM(output_dim=128, activation='relu', return_sequences=True))
    model.add(BatchNormalization())
    model.add(TimeDistributed(Dense(output_dim=1, activation='linear'))) # sequence labeling

    rmsprop = RMSprop(lr=0.00001, rho=0.9, epsilon=1e-08)
    model.compile(loss='mean_squared_error',
                  optimizer=rmsprop,
                  metrics=['mean_squared_error'])
    return model

输出如下：

61267 in the training set
6808 in the test set

Building training input vectors ...
888 unique feature names
The length of each vector will be 888
Using TensorFlow backend.

Build model...

****** Iterating over each batch of the training data ******
# Each batch has 1280 examples
# The training data are shuffled at the beginning of each epoch.
Epoch 1/3 : Batch 1/48 | loss = 607.043823 | root_mean_squared_error = 24.638334
Epoch 1/3 : Batch 2/48 | loss = 14479824582732.208323 | root_mean_squared_error = 3805236.468701
Epoch 1/3 : Batch 3/48 | loss = nan | root_mean_squared_error = nan
Epoch 1/3 : Batch 4/48 | loss = nan | root_mean_squared_error = nan
Epoch 1/3 : Batch 5/48 | loss = nan | root_mean_squared_error = nan
......

第二批损失非常高，然后变成了纳米。真实结果y不包含非常大的值。最大y小于400。

另一方面，我检查预测输出y_hat。 RNN返回一些非常高的预测，这导致无限。

但是，我仍然很困惑如何改进我的模型。

Answer 1

通过1）将输出层的激活从“线性”改变为“relu”和/或2）降低学习速率来解决问题“种类”。

然而，现在的预测都是零。

由于非常大的预测值，RNN损失变为NaN

1 个答案: