大家好,
我正在研究我的文凭论文,并且我面临班级贡献不平衡的二进制分类问题。我的负号(“ 0”)比正号(“ 1”)多了10倍。因此,我不仅要考虑精度和ROC-AUC,还要考虑加权/平衡精度和Precision-Recall-AUC。
我已经在GitHub(https://github.com/keras-team/keras/issues/12991)上问了这个问题,但问题尚未得到解决,因此我认为这里的平台可能是更好的地方!
在对自定义回调中的验证集进行一些计算期间,我偶然发现或多或少地发现,加权精度始终与我使用 sklearn.metrics.accuracy_score()的结果不同 )。
使用Keras,加权精度必须在 model.compile()中声明,并且是每个纪元后在logs {}字典中的键(并且CSVLogger也将其写入日志文件)回调或历史记录对象),或者由 model.evaluate(),
作为列表中的值返回model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'],
weighted_metrics=['accuracy'])
我使用Sklearn.metrics函数 class_weight.compute_sample_weight()和 class_weight.compute_class_weight()根据训练集的类贡献来计算val_sample_weights向量。 em>。
cls_weights = class_weight.compute_class_weight('balanced', np.unique(y_train._values),
y_train._values)
cls_weight_dict = {0: cls_weights[0], 1: cls_weights[1]}
val_sample_weights = class_weight.compute_sample_weight(cls_weight_dict, y_test._values)
在 model.fit()中,我将此向量与验证数据一起传递给 sklearn.metrics.accuracy_score(),将其传递给参数名称 sample_weight 可以在相同的基础上比较结果。
model_output = model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=1,
validation_data=(x_test, y_test, val_sample_weights))
此外,我从几个简单的示例中得出了Scitkit-learn如何计算加权准确度的方程式,似乎它是由以下方程式计算的(对我来说似乎很合理):
TP,TN,FP和FN是混淆矩阵中报告的值,而w_p和w_n分别是正类别和负类别的类别权重。可以在这里找到一个简单的示例进行测试:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html
出于完整性考虑, sklearn.metrics.accuracy_score(...,sample_weight =)返回与 sklearn.metrics.balanced_accuracy_score()相同的结果。
我搜索了一个简单的示例,以使问题更容易重现,即使此处的班级失衡较弱(1:2而非1:10)。它基于Keras的入门教程,可以在这里找到:
https://towardsdatascience.com/k-as-in-keras-simple-classification-model-a9d2d23d5b5a
如上链接中所述, Pima Indianas发病糖尿病数据集将从主页Machine Learning Mastery的制造者Jason Brownlee的存储库中下载。但是我想它也可以从其他各个站点下载。
所以最后是代码:
from keras.layers import Dense, Dropout
from keras.models import Sequential
from keras.regularizers import l2
import pandas as pd
import numpy as np
from sklearn.utils import class_weight
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
file = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/' \
'pima-indians-diabetes.data.csv'
# Load csv data from file to data using pandas
data = pd.read_csv(file, names=['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin',
'bmi', 'dpf', 'age', 'diabetes'])
# Process data
data.head()
x = data.drop(columns=['diabetes'])
y = data['diabetes']
# Split into train and test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)
# define a sequential model
model = Sequential()
# 1st hidden layer
model.add(Dense(100, activation='relu', input_dim=8, kernel_regularizer=l2(0.01)))
model.add(Dropout(0.3))
# 2nd hidden layer
model.add(Dense(100, activation='relu', kernel_regularizer=l2(0.01)))
model.add(Dropout(0.3))
# Output layer
model.add(Dense(1, activation='sigmoid'))
# Compilation with weighted metrics
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'],
weighted_metrics=['accuracy'])
# Calculate validation _sample_weights_ based on the class distribution of train labels and
# apply it to test labels using Sklearn
cls_weights = class_weight.compute_class_weight('balanced', np.unique(y_train._values),
y_train._values)
cls_weight_dict = {0: cls_weights[0], 1: cls_weights[1]}
val_sample_weights = class_weight.compute_sample_weight(cls_weight_dict, y_test._values)
# Train model
model_output = model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=1,
validation_data=(x_test, y_test, val_sample_weights))
# Predict model
y_pred = model.predict(x_test, batch_size=32, verbose=1)
# Classify predictions based on threshold at 0.5
y_pred_binary = (y_pred > 0.5) * 1
# Sklearn metrics
sklearn_accuracy = accuracy_score(y_test, y_pred_binary)
sklearn_weighted_accuracy = accuracy_score(y_test, y_pred_binary,
sample_weight=val_sample_weights)
# metric_list has 3 entries: [0] val_loss weighted by val_sample_weights, [1] val_accuracy
# [2] val_weighted_accuracy
metric_list = model.evaluate(x_test, y_test, batch_size=32, verbose=1,
sample_weight=val_sample_weights)
print('sklearn_accuracy=%.3f' %sklearn_accuracy)
print('sklearn_weighted_accuracy=%.3f' %sklearn_weighted_accuracy)
print('keras_evaluate_accuracy=%.3f' %metric_list[1])
print('keras_evaluate_weighted_accuracy=%.3f' %metric_list[2])
例如我得到:
sklearn_accuracy=0.792
sklearn_weighted_accuracy=0.718
keras_evaluate_accuracy=0.792
keras_evaluate_weighted_accuracy=0.712
对于Sklearn和Keras,“未加权”精度值相同。差异并不是很大,但是随着数据集变得更加不平衡,差异会越来越大。例如,对于我的任务,它总是彼此相差约5%!
也许我错过了某些东西,应该是那样的,但是无论如何,Keras和Sklearn提供了不同的值,这尤其令人困惑,尤其是将整个class_weights和sample_weights视为难以理解的话题。不幸的是,我对Keras的了解并不深,无法自己搜索Keras代码。
我非常感谢收到任何答复!
答案 0 :(得分:0)
我重复了您的确切玩具示例,实际上发现sklearn
和keras
确实给出了相同的结果。我重复了5次实验,以确保并非偶然,并且每次的结果都是相同的。对于其中一种运行,例如:
sklearn_accuracy=0.831
sklearn_weighted_accuracy=0.800
keras_evaluate_accuracy=0.831
keras_evaluate_weighted_accuracy=0.800
仅供参考,我正在使用sklearn
和keras
版本:
0.20.3
2.3.1
分别。请参阅以下Google colab示例:https://colab.research.google.com/drive/1b5pqbp9TXfKiY0ucEIngvz6_Tc4mo_QX