用于语义句子相似性的连体LSTM不能提高验证准确性

时间:2019-11-18 01:24:43

标签: machine-learning keras nlp lstm word2vec

我想找到两个句子的数据集之间的语义句子相似度。 如标题中所述,验证准确性并未提高。我的精度为0.25至0.30。

首先,我使用English Wikipedia dump创建了word2vec矩阵。然后,我使用函数text_to_sequence将我的句子转换为数组。

这是我的模型代码:

embedding_layer = Embedding(vocab_size, 300,
                            weights=[embedding_matrix],
                            input_length=50,
                            trainable=False,
                            mask_zero=True,
                            name='VectorLookup')
lstm = Bidirectional(LSTM(150, return_sequences=False, kernel_regularizer=regularizers.l2(1e-4), name='RNN'))

sent1_seq_in = Input(shape=(50,), dtype='int32', name='Sentence1')
embedded_sent1 = embedding_layer(sent1_seq_in)
encoded_sent1 = lstm(embedded_sent1)

sent2_seq_in = Input(shape=(50,), dtype='int32', name='Sentence2')
embedded_sent2 = embedding_layer(sent2_seq_in)
encoded_sent2 = lstm(embedded_sent2)

mul = Multiply(name='S1.S2')([encoded_sent1, encoded_sent2])
sub = Subtract(name='S1-S2')([encoded_sent1, encoded_sent2])
dif = Lambda(lambda x: K.abs(x), name='Abs')(sub)

concatenated = concatenate([mul, dif], name='Concat')
x = Dense(50, activation='sigmoid', name='Sigmoid', kernel_regularizer=regularizers.l2(1e-4))(concatenated)
preds = Dense(6, activation='softmax', kernel_regularizer=regularizers.l2(1e-4), name='Softmax')(x)

model = Model([sent1_seq_in, sent2_seq_in], preds)
model.summary()

model.compile(optimizer='adam', loss='kld', metrics=['accuracy'])
history = model.fit([sent1_train_seq, sent2_train_seq], train_score_to_probs,
                epochs=30,
                batch_size=32,
                shuffle=True,
                validation_data=([sent1_dev_seq, sent2_dev_seq], dev_score_to_probs))

0 个答案:

没有答案