在stateful=False
时,我一直在努力确切地了解何时在Keras LSTM模型中重新初始化hidden_state。我见过的各种教程都暗示它是在每个批次开始时重置的,但是据我所知,实际上是在批次中每个样本之间重置的。我错了吗?
我编写了以下代码对此进行了测试:
from keras.models import Sequential
from keras.layers import Dense, LSTM
import keras.backend as K
import numpy as np
import tensorflow as tf
a = [1, 0, 0]
b = [0, 1, 0]
c = [0, 0, 1]
seq = [a, b, c, b, a]
x = seq[:-1]
y = seq[1:]
window_size = 1
x = np.array(x).reshape((len(x), window_size , 3))
y = np.array(y)
def run_with_batch_size(batch_size=1):
model = Sequential()
model.add(LSTM(20, input_shape=(1, 3)))
model.add(Dense(3, activation='softmax'))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(500):
model.fit(x, y,
batch_size=batch_size,
epochs=1,
verbose=0,
shuffle=False
)
print(model.predict(np.array([[a], [b]]), batch_size=batch_size))
print()
print(model.predict(np.array([[b], [c]]), batch_size=batch_size))
print()
print(model.predict(np.array([[c], [b]]), batch_size=batch_size))
print('-'*30)
run_with_batch_size(1)
print('**')
run_with_batch_size(2)
运行此代码的结果:
------------------------------
# batch_size 1
[[0.01296294 0.9755857 0.01145133]
[0.48558792 0.02751653 0.4868956 ]]
[[0.48558792 0.02751653 0.4868956 ]
[0.01358072 0.9738273 0.01259203]]
[[0.01358072 0.9738273 0.01259203]
[0.48558792 0.02751653 0.4868956 ]]
**
# batch_size 2
# output of batch (a, b)
[[0.0255649 0.94444686 0.02998832]
[0.47172785 0.05804421 0.47022793]]
# output of batch (b, c)
# notice first output here is the same as the second output from above
[[0.47172785 0.05804421 0.47022793]
[0.03059724 0.93813574 0.03126698]]
[[0.03059724 0.93813574 0.03126698]
[0.47172785 0.05804421 0.47022793]]
------------------------------
当我的batch_size为1时:
model.predict([a, b])
会为b提供与运行model.predict([b, c])
时相同的结果,因为两个批次之间的状态均已重置为零。当我的batch_size为2时:
model.predict([a, b])
时,b的结果应该受到a的结果的影响(因为a的输出将被馈送到b的输入中)。这意味着,与我运行model.predict([b, c])
我对这个领域还很新鲜,所以我很可能误会了一些东西。批次中的每个样本之间而不是批次之间是否重置了初始状态?
答案 0 :(得分:1)
出色的测试,您在正确的轨道上。为了直接回答该问题,在stateful=False
时,在每次正向通过时为批次中的每个样品设置初始状态。跟随the source code:
def get_initial_state(self, inputs):
# build an all-zero tensor of shape (samples, output_dim)
initial_state = K.zeros_like(inputs) # (samples, timesteps, input_dim)
initial_state = K.sum(initial_state, axis=(1, 2)) # (samples,)
initial_state = K.expand_dims(initial_state) # (samples, 1)
# ...
这意味着批次中的每个样本都会获得零的干净初始状态。在call function中使用此功能:
if initial_state is not None:
pass
elif self.stateful:
initial_state = self.states
else:
initial_state = self.get_initial_state(inputs)
因此,如果stateful=False
并且您未提供任何明确的initial_states
,则代码将为RNN创建新的初始状态,包括从RNN层继承的LSTM。现在,call
负责计算前向通过,因此每次发现有成批计算的前向通过时,都会得到新的初始状态。