Question

在stateful=False时，我一直在努力确切地了解何时在Keras LSTM模型中重新初始化hidden_state。我见过的各种教程都暗示它是在每个批次开始时重置的，但是据我所知，实际上是在批次中每个样本之间重置的。我错了吗？

我编写了以下代码对此进行了测试：

from keras.models import Sequential
from keras.layers import Dense, LSTM
import keras.backend as K
import numpy as np
import tensorflow as tf

a = [1, 0, 0]
b = [0, 1, 0]
c = [0, 0, 1]

seq = [a, b, c, b, a]

x = seq[:-1]
y = seq[1:]
window_size = 1

x = np.array(x).reshape((len(x), window_size , 3))
y = np.array(y)

def run_with_batch_size(batch_size=1):
  model = Sequential()
  model.add(LSTM(20, input_shape=(1, 3)))
  model.add(Dense(3, activation='softmax'))
  model.compile(loss='mean_squared_error', optimizer='adam')

  for i in range(500):
    model.fit(x, y,
      batch_size=batch_size,
      epochs=1,
      verbose=0,
      shuffle=False
    )

  print(model.predict(np.array([[a], [b]]), batch_size=batch_size))
  print()
  print(model.predict(np.array([[b], [c]]), batch_size=batch_size))
  print()
  print(model.predict(np.array([[c], [b]]), batch_size=batch_size))


print('-'*30)
run_with_batch_size(1)
print('**')
run_with_batch_size(2)

运行此代码的结果：

------------------------------
# batch_size 1
[[0.01296294 0.9755857  0.01145133]
 [0.48558792 0.02751653 0.4868956 ]]

[[0.48558792 0.02751653 0.4868956 ]
 [0.01358072 0.9738273  0.01259203]]

[[0.01358072 0.9738273  0.01259203]
 [0.48558792 0.02751653 0.4868956 ]]
**
# batch_size 2
# output of batch (a, b)
[[0.0255649  0.94444686 0.02998832]
 [0.47172785 0.05804421 0.47022793]]

# output of batch (b, c)
# notice first output here is the same as the second output from above
[[0.47172785 0.05804421 0.47022793]
 [0.03059724 0.93813574 0.03126698]]

[[0.03059724 0.93813574 0.03126698]
 [0.47172785 0.05804421 0.47022793]]
------------------------------

我有一个由a，b和c组成的字母
我的序列是a，b，c，b，a
在此顺序中，a和c始终跟在b后面，b后面跟着c（如果b后面是a）或a（如果b前面是c）

当我的batch_size为1时：

我希望运行model.predict([a, b])会为b提供与运行model.predict([b, c])时相同的结果，因为两个批次之间的状态均已重置为零。
我得到的结果符合这些期望

当我的batch_size为2时：

我希望在运行model.predict([a, b])时，b的结果应该受到a的结果的影响（因为a的输出将被馈送到b的输入中）。这意味着，与我运行model.predict([b, c])
我得到的结果是，实际上两个b输出是相同的，这意味着在我的批次（a，b）中，隐藏状态在我的两个样本之间被重置了。

我对这个领域还很新鲜，所以我很可能误会了一些东西。批次中的每个样本之间而不是批次之间是否重置了初始状态？

Answer 1

出色的测试，您在正确的轨道上。为了直接回答该问题，在stateful=False时，在每次正向通过时为批次中的每个样品设置初始状态。跟随the source code：

def get_initial_state(self, inputs):
  # build an all-zero tensor of shape (samples, output_dim)
  initial_state = K.zeros_like(inputs)  # (samples, timesteps, input_dim)
  initial_state = K.sum(initial_state, axis=(1, 2))  # (samples,)
  initial_state = K.expand_dims(initial_state)  # (samples, 1)
  # ...

这意味着批次中的每个样本都会获得零的干净初始状态。在call function中使用此功能：

if initial_state is not None:
  pass
elif self.stateful:
  initial_state = self.states
else:
  initial_state = self.get_initial_state(inputs)

因此，如果stateful=False并且您未提供任何明确的initial_states，则代码将为RNN创建新的初始状态，包括从RNN层继承的LSTM。现在，call负责计算前向通过，因此每次发现有成批计算的前向通过时，都会得到新的初始状态。

Keras LSTM中的初始状态

1 个答案: