我正在尝试使用数据集的标题和文本特征来训练用于伪造新闻检测的LSTM模型。下面是我的模型的代码:
vocab_size = len(tokenizer.word_index) + 1 #gives me a value of 12
embedding_dim = 50
maxlen = 50
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.LSTM(128,activation = "relu"))
model.add(layers.Dense(256, activation = 'relu'))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(1, activation = 'softmax'))
model.compile(optimizer = opt,
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
model_train = model.fit(X_train_cnn, y_train2,
epochs = 15,
verbose = True,
validation_data=(X_test_cnn, y_test2),
batch_size = 32)
训练和验证数据的准确性均为47%左右
Epoch 15/15
30081/30081 [==============================] - 60s 2ms/step - loss: 7.9900 - accuracy: 0.4789 - val_loss: 8.0780 - val_accuracy: 0.4732
混乱矩阵:
array([[ 0, 7806],
[ 0, 7011]], dtype=int64)
分类报告:
precision recall f1-score support
0 0.00 0.00 0.00 7806
1 0.47 1.00 0.64 7011
accuracy 0.47 14817
我尝试了不同的纪元,batch_size,具有不同单位的2个LSTM层的组合,但是没有运气。请帮助我。