在TensorFlow中获取dynamic_rnn的最后输出打破了模型

时间:2018-10-27 16:56:09

标签: python tensorflow lstm rnn

所以我在修改TF模型,其中有这部分:

outputs, _  = tf.nn.dynamic_rnn(
                            rnn_cell,
                            embeddings,
                            dtype=tf.float32,
                            swap_memory=True,
                            sequence_length=num_tels_triggered)

 # (batch_size, max_num_tel * LSTM_SIZE)
 outputs = tf.layers.flatten(outputs)
 output_dropout = tf.layers.dropout(outputs, rate=dropout_rate, training=training, name="rnn_output_dropout")

 fc1 = tf.layers.dense(inputs=output_dropout, units=1024, kernel_regularizer=tf.contrib.layers.l2_regularizer(scale=0.004), name="fc1")

现在,此代码获取LSTM的所有输出并将它们连接在一起。我不想要那个,但是我只想获取LSTM的最后一个输出。所以我这样修改了它:

_, final_state = tf.nn.dynamic_rnn(
                        rnn_cell,
                        embeddings,
                        dtype=tf.float32,
                        swap_memory=True,
                        sequence_length=num_tels_triggered,
                        time_major = True)

output_rnn = final_state.h # last output of the sequence
output_dropout = tf.layers.dropout(output_rnn, rate=dropout_rate, training=training, name="rnn_output_dropout")

fc1 = tf.layers.dense(inputs=output_dropout, units=1024, kernel_regularizer=tf.contrib.layers.l2_regularizer(scale=0.004), name="fc1")

A,当我训练模型时,出现一个可怕的错误:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4] vs. [16]
 [[Node: gradients/softmax_cross_entropy_loss/Mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/softmax_cross_entropy_loss/Mul_grad/Shape, gradients/softmax_cross_entropy_loss/Mul_grad/Shape_1)]]
 [[Node: ConstantFoldingCtrl/softmax_cross_entropy_loss/assert_broadcastable/AssertGuard/Switch_0/_472 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_375_C...d/Switch_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

我注意到我真的很困惑,因为我所做的更改是非常本地化的:如果我的理解是正确的,它应该不会更改网络输出的形状或产生任何其他副作用。

有人可以给我一个有关这种变化副作用的合理假设吗?我是否正确,因为下游张量的形状应该没有改变?我的更改是否有效地完成了我想要的事情(之后仅将LSTM的最后一个输出提供给密集层)?

0 个答案:

没有答案