培训LSTM网络并使用不同的起点进行预测

时间:2017-06-28 03:38:50

标签: python tensorflow lstm

这是从张量流中使用LSTM单元的简单示例。我正在产生一股浪潮并训练我的网络十个时期,我正在努力预测第十一期。预测值X是真y的一个时期滞后。训练之后,我将会话保存到磁盘,并在预测时恢复 - 这是培训和部署模型到生产的典型。

当我预测最后一个时期时,y_predicted非常匹配真实的y。

如果我尝试使用任意起点预测正弦波,(即取消注释线114) test_data = test_data [16:] 为了使y的真实值偏移四分之一周期,似乎LSTM预测仍然从零开始,并且需要几个时期来赶上真实值,最终匹配先前的预测。事实上,第二种情况下的预测似乎仍然是一个完整的正弦波,而不是3/4波。

发生这种情况的原因是什么。如果我实现了一个回归量,我想从任何一点开始使用它。

https://github.com/fbora/mytensorflow/issues/1

import os
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn



def sin_signal():
    '''
    generate a sin function
    the train set is ten periods in length
    the test set is one additional period
    the return variable is in pandas format for easy plotting
    '''
    phase = np.arange(0, 2*np.pi*11, 0.1)
    y = np.sin(phase)
    data = pd.DataFrame.from_dict({'phase': phase, 'y':y})
    # fill the last element by 0 - it's the end of the period anyways
    data['X'] = data.y.shift(-1).fillna(0.0)
    train_data = data[data.phase<=2*np.pi*10].copy()
    test_data = data[data.phase>2*np.pi*10].copy()
    return train_data, test_data


class lstm_model():
    def __init__(self, size_x, size_y, num_units=32, num_layers=3, keep_prob=0.5):
        # def single_unit():
        #     return rnn.DropoutWrapper(
        #         rnn.LSTMCell(num_units), output_keep_prob=keep_prob)

        def single_unit():
            return rnn.LSTMCell(num_units)

        self.graph = tf.Graph()
        with self.graph.as_default():
            '''input place holders'''
            self.X = tf.placeholder(tf.float32, [None, size_x], name='X')
            self.y = tf.placeholder(tf.float32, [None, size_y], name='y')

            '''network'''
            cell = rnn.MultiRNNCell([single_unit() for _ in range(num_layers)])
            X = tf.expand_dims(self.X, -1)
            val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32)
            val = tf.transpose(val, [1, 0, 2])
            last = tf.gather(val, int(val.get_shape()[0])-1)
            weights = tf.Variable(tf.truncated_normal([num_units, size_y], 0.0, 1.0), name='weights')
            bias = tf.Variable(tf.zeros(size_y), name='bias')
            predicted_y = tf.nn.xw_plus_b(last, weights, bias, name='predicted_y')

            '''optimizer'''
            optimizer = tf.train.AdamOptimizer(name='adam_optimizer')
            global_step = tf.Variable(0, trainable=False, name='global_step')
            self.loss = tf.reduce_mean(tf.squared_difference(predicted_y, self.y), name='mse_loss')
            self.train_op = optimizer.minimize(self.loss, global_step=global_step, name='training_op')

            '''initializer'''
            self.init_op = tf.global_variables_initializer()


class lstm_regressor():
    def __init__(self):
        if not os.path.isdir('./check_pts'):
            os.mkdir('./check_pts')


    @staticmethod
    def get_shape(dataframe):
        df_shape = dataframe.shape
        num_rows = df_shape[0]
        num_cols = 1 if len(df_shape)<2 else df_shape[1]
        return num_rows, num_cols


    def train(self, X_train, y_train, iterations):
        train_pts, size_x = lstm_regressor.get_shape(X_train)
        train_pts, size_y = lstm_regressor.get_shape(y_train)
        model = lstm_model(size_x=size_x, size_y=size_y, num_units=32, num_layers=1)

        with tf.Session(graph=model.graph) as sess:
            sess.run(model.init_op)
            saver = tf.train.Saver()
            feed_dict={
                model.X: X_train.values.reshape(-1, size_x),
                model.y: y_train.values.reshape(-1, size_y)
            }

            for step in range(iterations):
                _, loss = sess.run([model.train_op, model.loss], feed_dict=feed_dict)
                if step%100==0:
                    print('step={}, loss={}'.format(step, loss))
            saver.save(sess, './check_pts/lstm')


    def predict(self, X_test):
        test_pts, size_x = lstm_regressor.get_shape(X_test)
        X_np = X_test.values.reshape(-1, size_x)
        graph = tf.Graph()
        with graph.as_default():
            with tf.Session() as sess:
                sess.run(tf.global_variables_initializer())
                saver = tf.train.import_meta_graph('./check_pts/lstm.meta')
                saver.restore(sess, './check_pts/lstm')
                X = graph.get_tensor_by_name('X:0')
                y_tf = graph.get_tensor_by_name('predicted_y:0')
                y_np = sess.run(y_tf, feed_dict={X: X_np})
                return y_np.reshape(test_pts)


def main():
    train_data, test_data = sin_signal()
    regressor = lstm_regressor()
    regressor.train(train_data.X, train_data.y, iterations=1000)
    # test_data = test_data[16:]
    y_predicted = regressor.predict(test_data.X)
    test_data['y_predicted'] = y_predicted

    test_data[['y', 'y_predicted']].plot()

if __name__ == '__main__':
    main()

1 个答案:

答案 0 :(得分:0)

我怀疑,由于您在未来的任意起点开始预测,因此您的模型训练的内容与开始预测的内容之间存在一定的差距,以及LSTM的状态还没有更新那个差距中的值?

***更新:

在你的代码中,你有这个:

val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32)

然后在训练期间:

_, loss = sess.run([model.train_op, model.loss], feed_dict=feed_dict)

我建议将初始状态输入dynamic_rnn并在每次训练迭代时重新提供更新的状态,如下所示:

inState = tf.placeholder(tf.float32, [YOUR_DIMENSIONS], name='inState')
val, state = tf.nn.dynamic_rnn(cell, X, time_major=True, dtype=tf.float32, initial_state=inState)

在训练期间:

iState = np.zeros([YOUR_DIMENSIONS])
feed_dict={
                model.X: X_train.values.reshape(-1, size_x),
                model.y: y_train.values.reshape(-1, size_y),
inState: iState # feed initial value for state placeholder
            }
_, loss, oState = sess.run([model.train_op, model.loss, model.state], feed_dict=feed_dict) # run one additional variable from the session
iState = oState # assign latest out-state to be re-fed as in-state

因此,通过这种方式,您的模型不仅可以在训练期间学习参数,还可以跟踪在州内训练期间看到的所有内容。现在,您将此状态与会话的其余部分一起保存,并在预测阶段使用它。

这方面的一个小难点是技术上这个State是一个占位符,所以根据我的经验它不会自动保存在Graph中。因此,您在训练结束时手动创建另一个变量并将状态分配给它;这样它就会保存在图表中以供日后使用:

# make sure this variable is declared BEFORE the saver is declared
savedState = tf.get_variable('savedState', shape=[YOUR_DIMENSIONS])
# then, at the end of training:
    assignOp = tf.assign(savedState, oState)
    sess.run(assignOp)
# now save your graph

所以现在一旦你恢复图表,如果你想在一些人为的差距之后开始你的预测,那么不知何故你仍然必须通过这个差距运行你的模型以更新状态。在我的情况下,我只是为整个间隙运行一个虚拟预测,只是为了更新状态,然后从这里以正常的间隔继续。

希望这会有所帮助......