我已经看到 keras 现在带有 Attention Layer。但是,我在 Seq2Seq 模型中使用它时遇到了一些问题。
这是没有注意的工作 seq2seq 模型:
latent_dim = 300
embedding_dim = 200
clear_session()
# Encoder
encoder_inputs = Input(shape=(max_text_len, ))
# Embedding layer
enc_emb = Embedding(x_voc, embedding_dim,
trainable=True)(encoder_inputs)
# Encoder LSTM 1
encoder_lstm1 = Bidirectional(LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_output1, forward_h1, forward_c1, backward_h1, backward_c1) = encoder_lstm1(enc_emb)
# Encoder LSTM 2
encoder_lstm2 = Bidirectional(LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_output2, forward_h2, forward_c2, backward_h2, backward_c2) = encoder_lstm2(encoder_output1)
# Encoder LSTM 3
encoder_lstm3 = Bidirectional(LSTM(latent_dim, return_state=True,
return_sequences=True, dropout=0.4,
recurrent_dropout=0.4))
(encoder_outputs, forward_h, forward_c, backward_h, backward_c) = encoder_lstm3(encoder_output2)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
# Set up the decoder, using encoder_states as the initial state
decoder_inputs = Input(shape=(None, ))
# Embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
# Decoder LSTM
decoder_lstm = LSTM(latent_dim*2, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.2)
(decoder_outputs, decoder_fwd_state, decoder_back_state) = \
decoder_lstm(dec_emb, initial_state=[state_h, state_c])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
我已经修改了模型以添加这样的注意力(这是在# Decoder LSTM
之后和# Dense Layer
之前):
attn_out, attn_states = Attention()([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
这会抛出 TypeError: Cannot iterate over a Tensor with unknown first dimension.
如何将注意力机制应用于我的 seq2seq 模型?如果 keras 注意层不起作用和/或其他模型易于使用,我也很乐意使用它们。
这是我运行模型的方式:
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=2)
history = model.fit(
[x_tr, y_tr[:, :-1]],
y_tr.reshape(y_tr.shape[0], y_tr.shape[1], 1)[:, 1:],
epochs=50,
callbacks=[es],
batch_size=128,
verbose=1,
validation_data=([x_val, y_val[:, :-1]],
y_val.reshape(y_val.shape[0], y_val.shape[1], 1)[:
, 1:]),
)
x_tr 的形状是 (89674, 300),y_tr[:, :-1] 是 (89674, 14)。同理,x_val 和 y_val[:, :-1] 的形状分别为 (9964, 300) 和 (9964, 14)。
答案 0 :(得分:1)
您使用的是来自 keras 的 Attention
层,它只返回一个 3D 张量而不是两个张量。
所以你的代码必须是:
attn_out = Attention()([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attn_out])