为了便于说明,假设我有一个简单的LSTM网络和一个输入序列X = (X1, ..., XT)
input Xt = (x1,...,xn) --> [LSTM] --> [output_layer] --> output(y1,...,yk)
有没有办法可以为网络提供单独的时间步输入,然后在最后调用training_op?我想要实现的伪代码:
# Define computational graph
x = tf.placeholder(tf.float32, [batch_size, num_features])
y = tf.placeholder(tf.float32, [batch_size, output_size])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
state = tf.placeholder(tf.float32, [batch_size, lstm.state_size])
lstm_output, state = lstm(x, state)
output = tf.nn.dense(lstm_output, units=units)
loss = tf.losses.mean_squared_error(y, output)
train_op = tf.train.AdamOptimizer(lr).minimize(loss)
# Train loop
with tf.Session() as sess:
for batch in batches:
state = np.zeros(...)
for timestep in batch:
feed_dict = construct_feed_dict(timestep, state)
out, _ = sess.run([output, loss], feed_dict)
# Defer the weight update until the end of sequence
sess.run(train_op, feed_dict=???)
我的理解是,返回的值是基本的numpy数组,因此如果我稍后再将它们作为输入的一部分提供给网络,则有关该值计算的信息将丢失。
我很清楚我可以在形状[total_timesteps,batch_size,num_features]中提供输入。但是,我发现自己处于无法采用这种方法的情况:
1)从网络输出f(y_t-1)
创建下一个时间步输入。
2)LSTM单元的隐藏状态在每个时间步被作为输入馈送到另一层。