Question

我目前正在尝试构建一个用于预测时间序列的简单模型。目标是使用序列训练模型，以便模型能够预测未来值。

我正在使用tensorflow和lstm单元格来执行此操作。该模型通过时间截断反向传播进行训练。我的问题是如何构建培训数据。

例如，让我们假设我们想要学习给定的序列：

[1,2,3,4,5,6,7,8,9,10,11,...]

我们将网络展开为num_steps=4。

选项1

input data               label     
1,2,3,4                  2,3,4,5
5,6,7,8                  6,7,8,9
9,10,11,12               10,11,12,13
...

选项2

input data               label     
1,2,3,4                  2,3,4,5
2,3,4,5                  3,4,5,6
3,4,5,6                  4,5,6,7
...

选项3

input data               label     
1,2,3,4                  5
2,3,4,5                  6
3,4,5,6                  7
...

选项4

input data               label     
1,2,3,4                  5
5,6,7,8                  9
9,10,11,12               13
...

任何帮助都将不胜感激。

Answer 1

我正准备在TensorFlow中学习LSTM并尝试实现一个例子（幸运的是）试图预测一些简单的数学函数所产生的时间序列/数字序列。

但我正在使用一种不同的方式来构建训练数据，由Unsupervised Learning of Video Representations using LSTMs推动：

LSTM Future Predictor Model

选项5：

input data               label     
1,2,3,4                  5,6,7,8
2,3,4,5                  6,7,8,9
3,4,5,6                  7,8,9,10
...

除了本文之外，我（尝试）通过给定的TensorFlow RNN示例获取灵感。我目前的完整解决方案如下所示：

import math
import random
import numpy as np
import tensorflow as tf

LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4


def ground_truth_func(i, j, t):
    return i * math.pow(t, 2) + j


def get_batch(batch_size):
    seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
    tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)

    for b in xrange(batch_size):
        i = float(random.randint(-25, 25))
        j = float(random.randint(-100, 100))
        for t in xrange(NUM_T_STEPS):
            value = ground_truth_func(i, j, t)
            seq[b, t, 0] = value

        for t in xrange(NUM_T_STEPS):
            tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
    return seq, tgt


# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])

fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))

# ENCODER
with tf.variable_scope('ENC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
    state = initial_state
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)

learned_representation = state

# DECODER
with tf.variable_scope('DEC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    state = learned_representation
    logits_stacked = None
    loss = 0.0
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)
        # output can be used to make next number prediction
        logits = tf.matmul(output, fc1_weight) + fc1_bias

        if logits_stacked is None:
            logits_stacked = logits
        else:
            logits_stacked = tf.concat(1, [logits_stacked, logits])

        loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE

reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))

train = tf.train.AdamOptimizer().minimize(reg_loss)

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    total_loss = 0.0
    for step in xrange(MAX_STEPS):
        seq_batch, target_batch = get_batch(BATCH_SIZE)

        feed = {sequence: seq_batch, target: target_batch}
        _, current_loss = sess.run([train, reg_loss], feed)
        if step % 10 == 0:
            print("@{}: {}".format(step, current_loss))
        total_loss += current_loss

    print('Total loss:', total_loss)

    print('### SIMPLE EVAL: ###')
    seq_batch, target_batch = get_batch(BATCH_SIZE)
    feed = {sequence: seq_batch, target: target_batch}
    prediction = sess.run([logits_stacked], feed)
    for b in xrange(BATCH_SIZE):
        print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
        print(" `-> Prediction: {}".format(prediction[0][b]))

示例输出如下：

### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
#  `-> Prediction: [model prediction]  
[  33.   53.  113.  213.] -> [  353.   533.   753.  1013.])
 `-> Prediction: [ 19.74548721  28.3149128   33.11489105  35.06603241]
[ -17.  -32.  -77. -152.] -> [-257. -392. -557. -752.])
 `-> Prediction: [-16.38951683 -24.3657589  -29.49801064 -31.58583832]
[ -7.  -4.   5.  20.] -> [  41.   68.  101.  140.])
 `-> Prediction: [ 14.14126873  22.74848557  31.29668617  36.73633194]
...

该模型是 LSTM-autoencoder ，每个都有2层。

不幸的是，正如您在结果中看到的那样，此模型无法正确学习序列。我可能就是这样，我只是在某个地方犯了一个错误的错误，或者1000-10000的训练步骤对于LSTM来说只是少数几个。正如我所说，我也刚刚开始正确理解/使用LSTM。但希望这可以为您提供有关实施的一些灵感。

Answer 2

阅读了几篇LSTM介绍博客，例如Jakob Aungiers'，选项3似乎是无状态LSTM的正确选项。

如果您的LSTM需要比num_steps更早记住数据，那么您可以以有状态的方式进行训练 - 对于Keras示例，请参阅Philippe Remy's blog post "Stateful LSTM in Keras"。但是，Philippe没有显示批量大于1的示例。我想在您的情况下，具有状态LSTM的批量大小为4可以与以下数据一起使用（写为input -> label）：

batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8

batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12

batch #2:
9,10,11,12 -> 13
...

由此，例如，批次＃0中的第二个样品被正确地重复使用以继续使用批次＃1的第二个样品进行培训。

这与您的选项4类似，但您没有在那里使用所有可用的标签。

<强>更新

在batch_size等于num_steps的情况下延伸到我的建议，Alexis Huet gives an answer因为batch_size是[{1}}的除数，可以用于较大的num_steps。他describes it nicely在他的博客上。

Answer 3

我认为选项1最接近/tensorflow/models/rnn/ptb/reader.py中的参考实现

def ptb_iterator(raw_data, batch_size, num_steps):
  """Iterate on the raw PTB data.

  This generates batch_size pointers into the raw PTB data, and allows
  minibatch iteration along these pointers.

  Args:
    raw_data: one of the raw data outputs from ptb_raw_data.
    batch_size: int, the batch size.
    num_steps: int, the number of unrolls.

  Yields:
    Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
    The second element of the tuple is the same data time-shifted to the
    right by one.

  Raises:
    ValueError: if batch_size or num_steps are too high.
  """
  raw_data = np.array(raw_data, dtype=np.int32)

  data_len = len(raw_data)
  batch_len = data_len // batch_size
  data = np.zeros([batch_size, batch_len], dtype=np.int32)
  for i in range(batch_size):
    data[i] = raw_data[batch_len * i:batch_len * (i + 1)]

  epoch_size = (batch_len - 1) // num_steps

  if epoch_size == 0:
    raise ValueError("epoch_size == 0, decrease batch_size or num_steps")

  for i in range(epoch_size):
    x = data[:, i*num_steps:(i+1)*num_steps]
    y = data[:, i*num_steps+1:(i+1)*num_steps+1]
    yield (x, y)

但是，另一个选项是为每个训练序列随机选择一个指向数据数组的指针。

如何训练具有LSTM细胞的RNN用于时间序列预测

3 个答案: