Question

我正在尝试为Keras中的时间序列预测创建LSTM。特别是，一旦训练了模型，它应该预测看不见的值。时间序列的可视化如下所示。

模型在蓝色时间序列上进行训练，并将预测与橙色时间序列进行比较。

为了进行预测，我想获取训练数据的最后 n 个点（其中 n 是序列长度），进行预测，并将此预测用于连续（第二）预测，即：

prediction(t+1) = model(obs(t-1), obs(t-2), ..., obs(t-n))
prediction(t+2) = model(prediction(t+1), obs(t-1), ..., obs(t-n))

我试图使它起作用，但是到目前为止没有成功。如果我应该使用有状态或无状态模型，那么我将无所适从，而对于序列长度来说，这是多么好的值。有人有经验吗？

我已经阅读并尝试了各种教程，但没有一篇适用于我的数据。

因为我想进行连续的预测，所以我需要一个有状态的模型来防止每次调用 model.predict 时喀拉拉邦都重置状态，但是批量训练为1会花费很多时间...还是有办法避免这个问题？

Answer 1

class LSTMNetwork(object):

def __init__(self, hidden_dim1, hidden_dim2, batch_size, seq_size):

    super(LSTMNetwork, self).__init__()

    self.model = self.build_model(hidden_dim1, hidden_dim2, batch_size, seq_size)

    self.hidden_dim1 = hidden_dim1
    self.hidden_dim2 = hidden_dim2
    self.batch_size = batch_size
    self.seq_size = seq_size

def build_model(self, hidden_dim1, hidden_dim2, batch_size, seq_size):
    """
    Build and return the model
    """
    # Define the model
    model = Sequential()

    # First LSTM and dropout layer
    model.add(LSTM(input_shape=(seq_size,1), output_dim=hidden_dim1, return_sequences=True))
    #model.add(Dropout(0.2))

    # Second LSTM and dropout layer
    model.add(LSTM(hidden_dim2, return_sequences=False))
    model.add(Dense(1))
    #model.add(Dropout(0.2))

    # Fully connected layer, with linear activation
    model.add(Activation("linear"))

    model.compile(loss="mean_squared_error", optimizer="adam")

    return model

def predict(self, x):
    """
    Given a vector of x, predict the output
    """
    out = self.model.predict(x)
    return out

def train_model(self, x, y, num_epochs):

    self.model.fit(x, y, epochs=num_epochs, batch_size=self.batch_size)

def predict_sequence(self, x, n, seq_size):
    """
    Given a sequence of [num_samples x seq_size x num_features], predict the next n values
    """

    curr_window = x[-1, :, :]

    predicted = []

    for i in range(n):
        predicted.append(self.predict(curr_window[np.newaxis, :, :])[0,0])
        curr_window = curr_window[1:]
        curr_window = np.insert(curr_window, [seq_size-1], predicted[-1], axis=0)

    return predicted

def preprocess_data(self, data, seq_size):
    """
    Generate training and target samples in a sliding window fashion. 
    Training samples are of size [num_samples x seq_size x num_features]
    Target samples are of size [num_samples, ]
    """
    x = []
    y = []

    for i in range(len(data) - seq_size-1):
        window = data[i:(i+seq_size)]

        after_window = data[i+seq_size]
        window = [[x] for x in window]

        x.append(window)
        y.append(after_window)

    x = np.array(x)
    y = np.array(y)

    return x, y

当以训练集的最后一行作为输入并在其上运行 predict_sequence 时，这将预测训练后的一条直线。可能是因为在每次调用 model.predict（）后重置了模型的状态吗？

Answer 2

当 whole 序列在形成输出中起作用时，使用

有状态LSTM。以极端的情况；您可能有1000个长度的序列，而该序列的第一个字符实际上就是定义输出的内容：

状态如果将其分成10 x 100个长度的序列，则使用有状态LSTM时，将保留批次中序列之间的连接（状态），并且它将（具有足够的示例）了解到第一个字符的关系对输出。实际上，序列长度并不重要，因为网络的状态会在整个数据段中保持不变，您只需将其作为提供数据的一种方式进行批处理即可。

无状态 在训练期间，每个序列后都会重置状态。因此，在我给出的示例中，网络不会得知它是定义输出的1000个长度的序列的第一个字符，因为它将永远看不到长期依赖性，因为第一个字符和最终的输出值处于单独的序列中，并且序列之间的状态不会保留。

摘要您需要确定的是，在时间序列结束时，数据是否依存性受到一开始可能发生的情况的影响。

我要说的是，这样的长期依赖性实际上是非常罕见的，您可能最好使用无状态LSTM，但是将序列长度设置为超参数以查找最佳模型的序列长度数据，即提供最准确的验证数据。

我需要有状态还是无状态LSTM？

2 个答案: