GRU(return_sequences = True)层之后的TimeDistributed Dense层导致尺寸错误

时间:2019-05-14 15:26:54

标签: python tensorflow keras

我目前正在尝试在Tensorflow上使用Keras进行第一步,以对时间序列数据进行分类。我能够运行一个非常简单的模型,但是在获得一些反馈后,建议我连续使用多个GRU层,并在我的Dense层周围添加 TimeDistributed包装器 。这是我尝试的模型:

model = Sequential()
model.add(GRU(100, input_shape=(n_timesteps, n_features), return_sequences=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(GRU(100, return_sequences=True, go_backwards=True, dropout=0.5))
model.add(TimeDistributed(Dense(units=100, activation='relu')))
model.add(TimeDistributed(Dense(n_outputs, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

当尝试使用形状为(2357,128,11)形状的输入拟合模型时,我收到以下错误消息:(2357个样本,128个时间步长,11个功能):

ValueError: Error when checking target: expected time_distributed_2 to have 3 dimensions, but got array with shape (2357, 5)

这是model.summary()的输出:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru_1 (GRU)                  (None, 128, 100)          33600     
_________________________________________________________________
gru_2 (GRU)                  (None, 128, 100)          60300     
_________________________________________________________________
gru_3 (GRU)                  (None, 128, 100)          60300     
_________________________________________________________________
gru_4 (GRU)                  (None, 128, 100)          60300     
_________________________________________________________________
gru_5 (GRU)                  (None, 128, 100)          60300     
_________________________________________________________________
gru_6 (GRU)                  (None, 128, 100)          60300     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 128, 100)          10100     
_________________________________________________________________
time_distributed_2 (TimeDist (None, 128, 5)            505       
=================================================================
Total params: 345,705
Trainable params: 345,705
Non-trainable params: 0

那么将多个GRU层连续放置并将TimeDistributed Wrapper添加到以下Dense层的正确方法是什么?我将非常感谢您提供任何有益的意见

1 个答案:

答案 0 :(得分:0)

如果您在GRU的最后一层中设置了return_sequences = False,该代码将起作用。

您只需要放置return_sequences = True,以防RNN的输出再次馈送到RNN的输入,从而保留了时间维空间。设置return_sequences = False时,这意味着输出将仅是最后一个隐藏状态(而不是每个时间步长的隐藏状态),并且时间维度将消失。

这就是为什么在设置return_sequnces = False时,输出维数从N减少到N-1。