Question

我正在编写一个模型，该模型具有基于某些值的二进制分类的keras，在我的测试/训练集中，我将正例和负例分离开来，为了训练，我将它们混在一起。在测试集上，我希望我不需要重新整理数据，因为顺序应该没有差异。尽管没有重新组合测试数据的模型的精度确实会降低，但是召回率和精度仍然很低。另一方面，当我重新整理测试数据时，精度保持不变，但是查全率和精度具有更高的值。我离开了精度图，然后回想起波纹管，所以检查一下。

所以我的第一个问题是，为什么改组和未改组数据的精度和召回值之间有区别？

第二个问题，我应该相信哪个分数，或者应该以不同的方式衡量召回率和准确性？

下面的代码：

top = 34000
toptop=35000

x_neg = full_x_neg[:43000]
y_neg = np.zeros(len(x_neg))
x_pos = full_x_pos[:34000]
y_pos = np.ones(len(x_pos))

x_test = np.asarray(full_x_neg[43000:45000] + full_x_pos[top:toptop])
y_test = np.asarray(np.concatenate((np.zeros(len(full_x_neg[43000:45000])), np.ones(len(full_x_pos[top:toptop])))))
x_test = x_test.reshape((len(x_test), 10, 12))
x_test, y_test = unison_shuffled_copies(x_test, y_test) #shuffling test

x, y = unison_shuffled_copies(x, y)
x = x.reshape((len(x), 10, 12))
batch_size = 64
print('Build model...')
model = Sequential()
model.add(LSTM(128, dropout=0.2,input_shape=(10, 12)))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
          optimizer='adam',
          metrics=[precision, recall, fscore])
print('Training...')
history = model.fit(x, y,validation_data=(x_test, y_test),
      batch_size=batch_size,
      epochs=15)
score, prec, rec, fscore = model.evaluate(x_test, y_test, batch_size=batch_size)

调用功能：

def recall(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

精度函数：

def precision(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

Recall with shuffled test data

Precision with shuffled test data

Recall withoud suffling test data

Precision without suffilng test data

Keras，不同的精度和改组和未改组的测试数据的召回率

0 个答案: