应用错误收集

我只是基于tensorflow nmt with attention 示例构建模型，我想知道我是否在文档中发现错误或误解了某些东西。

我的模型运行不正常，所以我绘制了每批之后的平均梯度。

这些图表明解码器的recurrent_kernel梯度始终为零，我发现这很奇怪。

假设这是错误的，我更改了

   # passing the concatenated vector to the GRU
    output, state = self.gru(x)

到

   # passing the concatenated vector to the GRU
    output, state = self.gru(x,initial_state=hidden)

现在情节更加符合我的预期

但是，由于我的模型正在训练中，我仍然不确定它是否最终收敛，我想知道是否有人可以确认我的假设，即我们必须将最后一个隐藏状态作为初始状态传递给解码器？