所以我在修改TF模型,其中有这部分:
outputs, _ = tf.nn.dynamic_rnn(
rnn_cell,
embeddings,
dtype=tf.float32,
swap_memory=True,
sequence_length=num_tels_triggered)
# (batch_size, max_num_tel * LSTM_SIZE)
outputs = tf.layers.flatten(outputs)
output_dropout = tf.layers.dropout(outputs, rate=dropout_rate, training=training, name="rnn_output_dropout")
fc1 = tf.layers.dense(inputs=output_dropout, units=1024, kernel_regularizer=tf.contrib.layers.l2_regularizer(scale=0.004), name="fc1")
现在,此代码获取LSTM的所有输出并将它们连接在一起。我不想要那个,但是我只想获取LSTM的最后一个输出。所以我这样修改了它:
_, final_state = tf.nn.dynamic_rnn(
rnn_cell,
embeddings,
dtype=tf.float32,
swap_memory=True,
sequence_length=num_tels_triggered,
time_major = True)
output_rnn = final_state.h # last output of the sequence
output_dropout = tf.layers.dropout(output_rnn, rate=dropout_rate, training=training, name="rnn_output_dropout")
fc1 = tf.layers.dense(inputs=output_dropout, units=1024, kernel_regularizer=tf.contrib.layers.l2_regularizer(scale=0.004), name="fc1")
A,当我训练模型时,出现一个可怕的错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4] vs. [16]
[[Node: gradients/softmax_cross_entropy_loss/Mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/softmax_cross_entropy_loss/Mul_grad/Shape, gradients/softmax_cross_entropy_loss/Mul_grad/Shape_1)]]
[[Node: ConstantFoldingCtrl/softmax_cross_entropy_loss/assert_broadcastable/AssertGuard/Switch_0/_472 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_375_C...d/Switch_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
我注意到我真的很困惑,因为我所做的更改是非常本地化的:如果我的理解是正确的,它应该不会更改网络输出的形状或产生任何其他副作用。
有人可以给我一个有关这种变化副作用的合理假设吗?我是否正确,因为下游张量的形状应该没有改变?我的更改是否有效地完成了我想要的事情(之后仅将LSTM的最后一个输出提供给密集层)?