我有一个太长而不适合记忆的序列,但是初始状态非常关键所以我想把它作为变量进行训练。如何训练初始状态变量以在序列开始时传入,但是对于序列的其余部分继续使用输出状态?
这是我到目前为止所得到的:
cell = tf.contrib.rnn.BasicLSTMCell(num_lstm_cells, state_is_tuple=True)
init_vars = cell.zero_state(batch_size, tf.float32)
init_c = tf.Variable(init_vars.c, trainable=True)
init_h = tf.Variable(init_vars.h, trainable=True)
init_state = tf.contrib.rnn.LSTMStateTuple(init_c, init_h)
state_vars = cell.zero_state(batch_size, tf.float32)
state_c = tf.Variable(state_vars.c, trainable=False)
state_h = tf.Variable(state_vars.h, trainable=False)
state = tf.contrib.rnn.LSTMStateTuple(state_c, state_h)
layer = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.7)
val, new_state = tf.nn.dynamic_rnn(layer, lstm_input, initial_state=state, dtype=tf.float32)
with tf.control_dependencies([state[0].assign(new_state[0]), state[1].assign(new_state[1])]):
output = tf.identity(val)
inititalise_c = tf.assign(state[0], init_state[0])
inititalise_h = tf.assign(state[1], init_state[1])
initialise_state = tf.group([inititalise_c, inititalise_h])
我的想法是我有一个可训练的初始状态变量(init_vars)和一个不可训练的状态(state_vars),我通过调用initialise_state
op将初始状态分配给每个序列的开头。
我认为这不会起作用,因为init_state实际上并不是训练的一部分,它只是用于复制。我怎么能这样做?
编辑:我已经在测试中证实初始状态没有被训练并且保持全部为0
答案 0 :(得分:1)
我最终通过在单独的变量范围内创建初始状态变量来解决这个问题。然后使用Optimizer.Minimize()中的var_list可选参数,我可以指定在每个序列的开头训练初始状态。在初始状态训练之后,我会将其复制到这个单独的变量范围,并训练该序列的其余部分的图形。
with tf.variable_scope("state"):
state_c = tf.Variable(tf.random_uniform([batch_size, num_lstm_cells], 0, 1), trainable=True)
state_h = tf.Variable(tf.random_uniform([batch_size, num_lstm_cells], 0, 1), trainable=True)
state = tf.contrib.rnn.LSTMStateTuple(state_c, state_h)
with tf.variable_scope("nn"):
layer = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.7)
val, new_state = tf.nn.dynamic_rnn(layer, lstm_input, initial_state=state, dtype=tf.float32)
logits = tf.layers.dense(val, units=5, activation=tf.nn.relu)
losses = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=targets)
init_c = tf.Variable(tf.zeros([batch_size, num_lstm_cells]), trainable=False)
init_h = tf.Variable(tf.zeros([batch_size, num_lstm_cells]), trainable=False)
init_state = tf.contrib.rnn.LSTMStateTuple(init_c, init_h)
restore_c = tf.assign(state[0], init_state[0])
restore_h = tf.assign(state[1], init_state[1])
restore_state = tf.group([restore_c, restore_h])
save_c = tf.assign(init_state[0], state[0])
save_h = tf.assign(init_state[1], state[1])
save_state = tf.group([save_c, save_h])
propagate_c = tf.assign(state[0], new_state[0])
propagate_h = tf.assign(state[1], new_state[1])
propagate_state = tf.group([propagate_c, propagate_h])
nn_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "nn")
state_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "state")
total_loss = tf.reduce_mean(losses)
train_nn_step = tf.train.AdamOptimizer().minimize(total_loss, var_list=nn_vars)
train_nn_state_step = tf.train.AdamOptimizer().minimize(total_loss, var_list=[nn_vars, state_vars])
所以你通过调用开始一个序列:
sess.run(restore_state)
将初始状态复制回图表_, er = sess.run([train_nn_state_step, error])
训练初始状态和nn sess.run(save_state)
保存初始状态sess.run(propagate_state)
将该州传播到下一个列车步骤你通过调用来训练剩下的序列:
_, er = sess.run([train_nn_step, error])
只是训练神经网络sess.run(propagate_state)
继续通过状态答案 1 :(得分:0)
如何在培训网络和初始状态之间切换?冻结模型,使初始状态可训练,训练一段时间。然后切换冻结。
答案 2 :(得分:0)
我不确定你想做什么,但是,为什么不将new_state分配给另一个状态变量,如下所示,
batch_size = 10
num_lstm_cells = 20
num_times = 5
input_dims = 6
lstm_input = tf.random_normal([batch_size, num_times, input_dims],0.,1.0)
cell = tf.contrib.rnn.BasicLSTMCell(num_lstm_cells, state_is_tuple=True)
init_vars = cell.zero_state(batch_size, tf.float32)
init_c = tf.Variable(init_vars.c, trainable=True)
init_h = tf.Variable(init_vars.h, trainable=True)
init_state = tf.contrib.rnn.LSTMStateTuple(init_c, init_h)
state_vars = cell.zero_state(batch_size, tf.float32)
state_c = tf.Variable(state_vars.c, trainable=False)
state_h = tf.Variable(state_vars.h, trainable=False)
state = tf.contrib.rnn.LSTMStateTuple(state_c, state_h)
layer = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.7)
val, new_state = tf.nn.dynamic_rnn(layer, lstm_input, initial_state=state, dtype=tf.float32)
trained_state_c = tf.assign(state[0], new_state[0])
trained_state_h = tf.assign(state[1], new_state[1])
trained_state = tf.contrib.rnn.LSTMStateTuple(trained_state_c, trained_state_h)