我正在尝试创建一个有状态的自动编码器模型。目标是使每个时间序列的自动编码器都处于有状态。数据包含10个时间序列,每个时间序列的长度为567。
timeseries#1: 451, 318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, ....
timeseries#2: 304, 274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, ....
...
timeseries#10: 208, 138, 201, 342, 280, 282, 280, 140, 124, 261, 193, .....
我的回溯窗口为28。因此我以28个时间步生成了以下序列:
[451, 318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, .... ]
[318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, 56, ....]
[404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, 56, 890, ....]
...
[304, 274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, ....]
[274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, 127, ....]
[150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, 127, 798, ....]
...
[208, 138, 201, 342, 280, 282, 280, 140, 124, 261, 193, .....]
[138, 201, 342, 280, 282, 280, 140, 124, 261, 193, 854, .....]
这给了我每个时间序列539个序列。我需要做的是使LSTM在每个时间序列上都具有状态,并在查看时间序列中的所有序列后重置状态。这是我的代码:
batch_size = 35 #(total Number of samples is 5390, and it is dividable by 35)
timesteps = 28
n_features = 1
hunits = 14
RepeatVector(timesteps/hunits = 2)
epochs = 1000
inputEncoder = Input(batch_shape=(35, 28, 1), name='inputEncoder')
outEncoder, c, h = LSTM(14, stateful=True, return_state=True, name='outputEncoder')(inputEncoder)
encoder_model = Model(inputEncoder, outEncoder)
context = RepeatVector(2, name='inputDecoder')(outEncoder)
context_reshaped = Reshape(28, 1), name='ReshapeLayer')(context)
outDecoder = LSTM(1, return_sequences=True, stateful=True, name='decoderLSTM')(context_reshaped)
autoencoder = Model(inputEncoder, outDecoder)
autoencoder.compile(loss='mse', optimizer='rmsprop')
for i in range(epochs):
history = autoencoder.fit(data, data,
validation_split=config['validation_split_ratio'],
shuffle=False,
batch_size=35,
epochs=1,
)
autoencoder.reset_states()
2个问题:
1-我在第一个时期完成后收到此错误,我想知道它是如何发生的:
ValueError: Cannot feed value of shape (6, 28, 1) for Tensor u'inputEncoder:0', which has shape '(35, 28, 1)'
2-我认为该模型无法按我的意愿工作。在这里,它将在所有批次(一个时期)后重置状态,这意味着在处理了所有时间序列之后。如何在时间序列之间将其更改为有状态?
答案 0 :(得分:1)
问题出在validation_split
汇率上!!它将其设置为0.33%,并且在发生拆分时,它将尝试对我的batch_size=35
无法整除的3611个数据样本进行训练。根据此post,我可以找到正确的数字,并从该帖子中复制该信息:
def quantize_validation_split(validation_split, sample_count, batch_size): batch_count = sample_count / batch_size return float(int(batch_count * validation_split)) / batch_count
然后您可以呼叫
model.fit(..., validation_split=fix_validation_split(0.05, len(X), batch_size))
。但 如果keras在fit()中为您这样做,那就太酷了。
此外,关于使自动编码器具有我所需要的状态:在每个时期结束时都不应出现reset_state
!