我想保存LSTM的最终状态,以便在恢复模型时包含它,并可用于预测。如下所述,当我使用tf.assign
时,Saver只知道最终状态。但是,这会引发错误(也在下面说明)。
在培训期间,我总是将最终的LSTM状态反馈回网络,如this post中所述。以下是代码的重要部分:
构建图表时:
self.init_state = tf.placeholder(tf.float32, [
self.n_layers, 2, self.batch_size, self.n_hidden
])
state_per_layer_list = tf.unstack(self.init_state, axis=0)
rnn_tuple_state = tuple([
tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(self.n_layers)
])
outputs, self.final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=rnn_tuple_state)
在训练期间:
_current_state = np.zeros((self.n_layers, 2, self.batch_size,
self.n_hidden))
_train_step, _current_state, _loss, _acc, summary = self.sess.run(
[
self.train_step, self.final_state,
self.merged
],
feed_dict={self.inputs: _inputs,
self.labels:_labels,
self.init_state: _current_state})
当我稍后从检查点恢复我的模型时,最终状态也不会恢复。正如this post中所述,问题在于Saver不了解新状态。该帖子还提出了一个基于tf.assign
的解决方案。遗憾的是,我无法使用建议的
assign_op = tf.assign(self.init_state, _current_state)
self.sess.run(assign_op)
因为self.init状态不是变量而是占位符。我收到了错误
AttributeError:'Tensor'对象没有属性'assign'
我已经尝试解决这个问题几个小时了,但我无法让它工作。
感谢任何帮助!
修改
我已将self.init_state更改为
self.init_state = tf.get_variable('saved_state', shape=
[self.n_layers, 2, self.batch_size, self.n_hidden])
state_per_layer_list = tf.unstack(self.init_state, axis=0)
rnn_tuple_state = tuple([
tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(self.n_layers)
])
outputs, self.final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=rnn_tuple_state)
在训练期间,我没有为self.init_state提供值:
_train_step, _current_state, _loss, _acc, summary = self.sess.run(
[
self.train_step, self.final_state,
self.merged
],
feed_dict={self.inputs: _inputs,
self.labels:_labels})
但是,我仍然无法运行赋值操作。知道我得到了
TypeError:期望的float32传递给op'Disign'的参数'value',得到(LSTMStateTuple(c = array([[0.07291573,-0.06366599,-0.23425588,...,0.05307654,
答案 0 :(得分:1)
为了保存最终状态,您可以创建一个单独的TF变量,然后在保存图形之前,运行 {
"size": 0,
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 5000000"
},
"buckets_path": {
"amount": "paidAmountTotal"
}
}
}
}
}
}
}
op将最新状态分配给该变量,然后保存图形。您唯一需要记住的是在声明assign
之前声明该变量;否则它不会被包含在图表中。
这里将详细讨论,包括工作代码: TF LSTM: Save State from training session for prediction session later
***更新:后续问题的答案:
看起来您正在使用Saver
,BasicLSTMCell
。我之前讨论过的使用state_is_tuple=True
和GRUCell
的讨论。两者之间的细节有些不同,但整体方法可能类似,所以希望这对您有用:
在训练过程中,您首先将state_is_tuple=False
作为initial_state
输入零,然后将其自身的输出作为输入重新输入dynamic_rnn
。因此,initial_state
调用的最后输出状态是您希望以后保存的内容。由于它来自dynamic_rnn
调用,实质上它是一个numpy数组(不是张量而不是占位符)。所以问题相当于"如何将numpy数组作为Tensorflow变量与图中的其余变量一起保存。"这就是为什么你将最终状态分配给一个唯一目的的变量。
所以,代码是这样的:
sess.run()
如前所述,这是一种经过修改的方法,适用于 # GRAPH DEFINITIONS:
state_in = tf.placeholder(tf.float32, [LAYERS, 2, None, CELL_SIZE], name='state_in')
l = tf.unstack(state_in, axis=0)
state_tup = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(NLAYERS)])
#multicell = your BasicLSTMCell / MultiRNN definitions
output, state_out = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32, initial_state=state_tup)
savedState = tf.get_variable('savedState', shape=[LAYERS, 2, BATCHSIZE, CELL_SIZE])
saver = tf.train.Saver(max_to_keep=1)
in_state = np.zeros((LAYERS, 2, BATCHSIZE, CELL_SIZE))
# TRAINING LOOP:
feed_dict = {X: x, Y_: y_, batchsize: BATCHSIZE, state_in:in_state}
_, out_state = sess.run([training_step, state_out], feed_dict=feed_dict)
in_state = out_state
# ONCE TRAINING IS OVER:
assignOp = tf.assign(savedState, out_state)
sess.run(assignOp)
saver.save(sess, pathModel + '/my_model.ckpt')
# RECOVERING IN A DIFFERENT PROGRAM:
gInit = tf.global_variables_initializer().run()
lInit = tf.local_variables_initializer().run()
new_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
new_saver.restore(sess, pathModel + 'my_model.ckpt')
# retrieve State and get its LAST batch (latest obervarions)
savedState = sess.run('savedState:0') # this is FULL state from training
state = savedState[:,:,-1,:] # -1 gets only the LAST batch of the state (latest seen observations)
state = np.reshape(state, [state.shape[0], 2, -1, state.shape[2]]) #[LAYERS, 2, 1 (BATCH), SELL_SIZE]
#x = .... (YOUR INPUTS)
feed_dict = {'X:0': x, 'state_in:0':state}
#PREDICTION LOOP:
preds, state = sess.run(['preds:0', 'state_out:0'], feed_dict = feed_dict)
# so now state will be re-fed into feed_dict with the next loop iteration
GRUCell
,state_is_tuple = False
。我对其进行了调整,以便BasicLSTMCell
与state_is_tuple=True
一起尝试。它有效,但不如原始方法准确。我不知道它是否仅仅因为对我来说GRU比LSTM更好或者出于其他原因。看看这对你有用......
另外请记住,正如您可以看到的恢复和预测代码,您的预测可能会基于不同的批量大小而不是您的训练循环(我猜一批1?)所以你必须考虑如何处理你的恢复状态 - 只需要最后一批?或者是其他东西?此代码仅采用保存状态的最后一层(即最近的培训观察结果),因为这与我的相关内容......