Tensorflow在dynamic_rnn中保存LSTM的最终状态以进行预测

时间:2017-07-20 08:20:20

标签: python machine-learning tensorflow lstm rnn

我想保存LSTM的最终状态,以便在恢复模型时包含它,并可用于预测。如下所述,当我使用tf.assign时,Saver只知道最终状态。但是,这会引发错误(也在下面说明)。

在培训期间,我总是将最终的LSTM状态反馈回网络,如this post中所述。以下是代码的重要部分:

构建图表时:

            self.init_state = tf.placeholder(tf.float32, [
                self.n_layers, 2, self.batch_size, self.n_hidden
            ])

            state_per_layer_list = tf.unstack(self.init_state, axis=0)

            rnn_tuple_state = tuple([
                tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
                                              state_per_layer_list[idx][1])

                for idx in range(self.n_layers)
            ])

            outputs, self.final_state = tf.nn.dynamic_rnn(
                cell, inputs=self.inputs, initial_state=rnn_tuple_state)

在训练期间:

        _current_state = np.zeros((self.n_layers, 2, self.batch_size,
                                   self.n_hidden))

            _train_step, _current_state, _loss, _acc, summary = self.sess.run(
                [
                    self.train_step, self.final_state,
                    self.merged
                ],
                feed_dict={self.inputs: _inputs,
                           self.labels:_labels, 
                           self.init_state: _current_state})

当我稍后从检查点恢复我的模型时,最终状态也不会恢复。正如this post中所述,问题在于Saver不了解新状态。该帖子还提出了一个基于tf.assign的解决方案。遗憾的是,我无法使用建议的

            assign_op = tf.assign(self.init_state, _current_state)
            self.sess.run(assign_op)

因为self.init状态不是变量而是占位符。我收到了错误

AttributeError:'Tensor'对象没有属性'assign'

我已经尝试解决这个问题几个小时了,但我无法让它工作。

感谢任何帮助!

修改

我已将self.init_state更改为

            self.init_state = tf.get_variable('saved_state', shape=
            [self.n_layers, 2, self.batch_size, self.n_hidden])

            state_per_layer_list = tf.unstack(self.init_state, axis=0)

            rnn_tuple_state = tuple([
                tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
                                              state_per_layer_list[idx][1])

                for idx in range(self.n_layers)
            ])

            outputs, self.final_state = tf.nn.dynamic_rnn(
                cell, inputs=self.inputs, initial_state=rnn_tuple_state)

在训练期间,我没有为self.init_state提供值:

            _train_step, _current_state, _loss, _acc, summary = self.sess.run(
                [
                    self.train_step, self.final_state,
                    self.merged
                ],
                feed_dict={self.inputs: _inputs,
                           self.labels:_labels})

但是,我仍然无法运行赋值操作。知道我得到了

TypeError:期望的float32传递给op'Disign'的参数'value',得到(LSTMStateTuple(c = array([[0.07291573,-0.06366599,-0.23425588,...,0.05307654,

1 个答案:

答案 0 :(得分:1)

为了保存最终状态,您可以创建一个单独的TF变量,然后在保存图形之前,运行 { "size": 0, "aggregations": { "totalPaidAmount": { "nested": { "path": "count" }, "aggregations": { "paidAmountTotal": { "sum": { "field": "count.totalPaidAmount" } }, "paidAmount_filter": { "bucket_selector": { "script": { "inline": "amount > 5000000" }, "buckets_path": { "amount": "paidAmountTotal" } } } } } } } op将最新状态分配给该变量,然后保存图形。您唯一需要记住的是在声明assign之前声明该变量;否则它不会被包含在图表中。

这里将详细讨论,包括工作代码: TF LSTM: Save State from training session for prediction session later

***更新:后续问题的答案:

看起来您正在使用SaverBasicLSTMCell。我之前讨论过的使用state_is_tuple=TrueGRUCell的讨论。两者之间的细节有些不同,但整体方法可能类似,所以希望这对您有用:

在训练过程中,您首先将state_is_tuple=False作为initial_state输入零,然后将其自身的输出作为输入重新输入dynamic_rnn。因此,initial_state调用的最后输出状态是您希望以后保存的内容。由于它来自dynamic_rnn调用,实质上它是一个numpy数组(不是张量而不是占位符)。所以问题相当于"如何将numpy数组作为Tensorflow变量与图中的其余变量一起保存。"这就是为什么你将最终状态分配给一个唯一目的的变量。

所以,代码是这样的:

sess.run()

如前所述,这是一种经过修改的方法,适用于 # GRAPH DEFINITIONS: state_in = tf.placeholder(tf.float32, [LAYERS, 2, None, CELL_SIZE], name='state_in') l = tf.unstack(state_in, axis=0) state_tup = tuple( [tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1]) for idx in range(NLAYERS)]) #multicell = your BasicLSTMCell / MultiRNN definitions output, state_out = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32, initial_state=state_tup) savedState = tf.get_variable('savedState', shape=[LAYERS, 2, BATCHSIZE, CELL_SIZE]) saver = tf.train.Saver(max_to_keep=1) in_state = np.zeros((LAYERS, 2, BATCHSIZE, CELL_SIZE)) # TRAINING LOOP: feed_dict = {X: x, Y_: y_, batchsize: BATCHSIZE, state_in:in_state} _, out_state = sess.run([training_step, state_out], feed_dict=feed_dict) in_state = out_state # ONCE TRAINING IS OVER: assignOp = tf.assign(savedState, out_state) sess.run(assignOp) saver.save(sess, pathModel + '/my_model.ckpt') # RECOVERING IN A DIFFERENT PROGRAM: gInit = tf.global_variables_initializer().run() lInit = tf.local_variables_initializer().run() new_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta') new_saver.restore(sess, pathModel + 'my_model.ckpt') # retrieve State and get its LAST batch (latest obervarions) savedState = sess.run('savedState:0') # this is FULL state from training state = savedState[:,:,-1,:] # -1 gets only the LAST batch of the state (latest seen observations) state = np.reshape(state, [state.shape[0], 2, -1, state.shape[2]]) #[LAYERS, 2, 1 (BATCH), SELL_SIZE] #x = .... (YOUR INPUTS) feed_dict = {'X:0': x, 'state_in:0':state} #PREDICTION LOOP: preds, state = sess.run(['preds:0', 'state_out:0'], feed_dict = feed_dict) # so now state will be re-fed into feed_dict with the next loop iteration GRUCellstate_is_tuple = False。我对其进行了调整,以便BasicLSTMCellstate_is_tuple=True一起尝试。它有效,但不如原始方法准确。我不知道它是否仅仅因为对我来说GRU比LSTM更好或者出于其他原因。看看这对你有用......

另外请记住,正如您可以看到的恢复和预测代码,您的预测可能会基于不同的批量大小而不是您的训练循环(我猜一批1?)所以你必须考虑如何处理你的恢复状态 - 只需要最后一批?或者是其他东西?此代码仅采用保存状态的最后一层(即最近的培训观察结果),因为这与我的相关内容......