Question

我正在做一个关于序列模型的项目。当我读到paper时，我发现LSTM有很多变种。令我困惑的是，在使用LSTM cell时，我不清楚Keras背后的架构。有人能告诉我Keras使用的LSTM架构吗？是picture-1还是picture-2？或其他人？

我使用的Keras版本是2.1.5。

Answer 1

从here，在call方法中，我们可以看到答案。（单元格的call方法是“一步”的计算方法。）

这是第一个实施，其中包含一些隐藏步骤。

显示：

作为调用方法的输入，图片中有inputs h(T-1)(L-1);和states在元组中包含h(T-1)(L)和C(t-1)(L)。

最初，输入将使用内核和偏差进行预处理，并被称为X，在i, f, c, o中分隔，对应于图像中的相同字母。这一步不在图中。

x_i = K.dot(inputs_i, self.kernel_i)
x_f = K.dot(inputs_f, self.kernel_f)
x_c = K.dot(inputs_c, self.kernel_c)
x_o = K.dot(inputs_o, self.kernel_o)
if self.use_bias:
    x_i = K.bias_add(x_i, self.bias_i)
    x_f = K.bias_add(x_f, self.bias_f)
    x_c = K.bias_add(x_c, self.bias_c)
    x_o = K.bias_add(x_o, self.bias_o)

这些将是真正的h(T-1)(L-1)。

状态h(T-1)(L)也将被拆分为四个，但我想，只是为了代码可读性，没有任何预处理：

h_tm1_i = h_tm1
h_tm1_f = h_tm1
h_tm1_c = h_tm1
h_tm1_o = h_tm1

现在，图片的下半部分。你在图片中看到sigmoid，keras使用recurrent_activation;在您看到tanh的位置，keras使用activation。

但是keras重复内核（循环权重）仅应用于H（即h(T-1)(L)），而不是X（h(T-1)(L-1)）。（因为X具有不同的维度，需要使用另一个内核进行预处理，如上所示）。

以keras代码计算的图片中的箭头：
（考虑到图片中c_tm1为C(T-1)(L)）

#line i in the picture - C does not participate here
i = self.recurrent_activation(x_i + K.dot(h_tm1_i, self.recurrent_kernel_i))

#line f in the picture - C doesn't participate here either
f = self.recurrent_activation(x_f + K.dot(h_tm1_f, self.recurrent_kernel_f))

#upper C line + lower C line
c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c, self.recurrent_kernel_c))

#line o in the picture - C doesn't participate here as well    
o = self.recurrent_activation(x_o + K.dot(h_tm1_o, self.recurrent_kernel_o))

最后，有最后的乘法：

h = o * self.activation(c)

输出为return h, [h, c]，显示h(T)(L)作为输出和下一步的状态。 C也将成为下一步的状态。

<强>注：

图片中的所有乘法符号都是keras中的“元素”乘法。

Keras LSTM电池背后的架构是什么？

1 个答案: