我正在编写一个模型,该模型具有基于某些值的二进制分类的keras,在我的测试/训练集中,我将正例和负例分离开来,为了训练,我将它们混在一起。在测试集上,我希望我不需要重新整理数据,因为顺序应该没有差异。尽管没有重新组合测试数据的模型的精度确实会降低,但是召回率和精度仍然很低。另一方面,当我重新整理测试数据时,精度保持不变,但是查全率和精度具有更高的值。我离开了精度图,然后回想起波纹管,所以检查一下。
所以我的第一个问题是,为什么改组和未改组数据的精度和召回值之间有区别?
第二个问题,我应该相信哪个分数,或者应该以不同的方式衡量召回率和准确性?
下面的代码:
top = 34000
toptop=35000
x_neg = full_x_neg[:43000]
y_neg = np.zeros(len(x_neg))
x_pos = full_x_pos[:34000]
y_pos = np.ones(len(x_pos))
x_test = np.asarray(full_x_neg[43000:45000] + full_x_pos[top:toptop])
y_test = np.asarray(np.concatenate((np.zeros(len(full_x_neg[43000:45000])), np.ones(len(full_x_pos[top:toptop])))))
x_test = x_test.reshape((len(x_test), 10, 12))
x_test, y_test = unison_shuffled_copies(x_test, y_test) #shuffling test
x, y = unison_shuffled_copies(x, y)
x = x.reshape((len(x), 10, 12))
batch_size = 64
print('Build model...')
model = Sequential()
model.add(LSTM(128, dropout=0.2,input_shape=(10, 12)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=[precision, recall, fscore])
print('Training...')
history = model.fit(x, y,validation_data=(x_test, y_test),
batch_size=batch_size,
epochs=15)
score, prec, rec, fscore = model.evaluate(x_test, y_test, batch_size=batch_size)
调用功能:
def recall(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
精度函数:
def precision(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
Recall with shuffled test data
Precision with shuffled test data