Question

嘿，我想使用CNN + RNN对图像进行回归任务，我不知道如何正确处理序列长度和状态。

我考虑过以下操作：使用CNN提取一帧的功能。将展平的激活贴图放入LSTM并保存状态。将LSTM输出减少到我的回归值。对于下一帧，我将使用前一次迭代恢复LSTM的状态。但是，由于我正在我的LSTM细胞周围构建一个RNN而不是它应该是正确的，所以感觉完全错了？

但是如果我在LSTM中输入一系列帧（在CNN应用到所有帧之后），我会得到多个输出和一个状态。如果我重用那个状态，我根本看不到帧序列的重点。我完全糊涂了。

目前我正在这样做，但这并不比仅在每个帧上应用的CNN更好......

with tf.variable_scope('CNN'):
    for time_step in xrange(sequence_length):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        cnn_res = CNN(images[time_step], normalizer_params=normalizer_params, regularizer=regularizer)
        cnn_outputs.append(cnn_res)

cnn_outputs = tf.pack(cnn_outputs)
with tf.variable_scope('RNN'):
    lstm_cell = LSTMBlockCell(128)
    lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
    cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] *3)
    (rnn_outputs, state) = tf.nn.dynamic_rnn(cell, cnn_outputs,initial_state=initial_state, time_major=True,dtype=tf.float32)
    rnn_outputs = rnn_outputs[sequence_length-1] # Using only last output for sequence, also tried to take every output into account.
    rnn_outputs = layers.flatten(rnn_outputs)

一些完全连接的图层会将rnn_outputs缩小为我的单个值。

实际上我想做的事情就是这个（只是我希望得到当前收到的帧的值而没有任何未来的帧）：How do you pass video features from a CNN to an LSTM?但我很难在tensorflow中实现这一点

了解Tensorflow中的CNN + LSTM RNN

0 个答案: