我在imdb数据集上训练了双向lstm,使用keras和tensorflow作为后端进行情感分析。 This是keras的例子。训练后,准确率迅速提升至90%及以上进行训练,84%进行验证。所以,非常好。
但是当我创建一个自定义的注意力解码器层并训练网络时,训练和验证从第1纪元到第10期保持不变。
下面是我的imdb数据集培训代码,它实现了一个自定义注意解码器层。
max_features = 20000
maxlen = 80
batch_size = 32
timesteps = 80
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = x_train[:5000]
y_train = y_train[:5000]
x_test = x_test[:5000]
y_test = y_test[:5000]
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print (x_train[0]) #Shape(5000, 80)
print (y_train[0]) #Shape (5000,)
print('Build model...')
encoder_units=64
decoder_units=64
n_labels=1
trainable=True
return_probabilities=False
def modelnmt():
input_ = Input(shape=(80,), dtype='float32')
print (input_.get_shape())
input_embed = Embedding(max_features, 128 ,input_length=80)(input_)
print (input_embed.get_shape())
rnn_encoded = Bidirectional(LSTM(encoder_units, return_sequences=True),
name='bidirectional_1',
merge_mode='concat')(input_embed)
print (rnn_encoded.get_shape())
y_hat = AttentionDecoder(decoder_units,
name='attention_decoder_1',
output_dim=n_labels,
return_probabilities=return_probabilities,
trainable=trainable)(rnn_encoded)
y_adec = Reshape((80,))(y_adec)
y_hat = Dense(1, activation='sigmoid')(y_adec)
model = Model(inputs=input_, outputs=y_hat)
model.summary()
return model
model = modelnmt()
# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print('Train...')
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=10,
validation_data=(x_test, y_test))
这是输出:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 80) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 80, 128) 2560000
_________________________________________________________________
bidirectional_1 (Bidirection (None, 80, 128) 98816
_________________________________________________________________
attention_decoder_1 (Attenti (None, 80, 1) 58050
_________________________________________________________________
reshape_1 (Reshape) (None, 80) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 81
=================================================================
Total params: 2,716,947
Trainable params: 2,716,947
Non-trainable params: 0
_________________________________________________________________
Epoc 1/10
5000/5000 [==============================] - 289s - loss: 0.6955 - acc: 0.5056 - val_loss: 0.6935 - val_acc: 0.4956
Epoch 2/10
5000/5000 [==============================] - 348s - loss: 0.6944 - acc: 0.4956 - val_loss: 0.6936 - val_acc: 0.4956
我将简要解释模型。首先将这些词嵌入然后传递给双向LSTM以后注意解码器。那么,什么注意解码器它输出一个数字的时间步数,即(无,80,1)。这是number is then reshaped
并传递给Dense Layer
来计算overall sentiment of the sentence
(概率)。注意解码器的输出稍后可用于可视化句子中每个单词的贡献
什么是possible reasons for such output?