Keras和Scikit-learn的加权精度度量之间的差异

时间:2019-06-24 10:13:21

标签: python tensorflow keras deep-learning

简介

大家好,

我正在研究我的文凭论文,并且我面临班级贡献不平衡的二进制分类问题。我的负号(“ 0”)比正号(“ 1”)多了10倍。因此,我不仅要考虑精度和ROC-AUC,还要考虑加权/平衡精度和Precision-Recall-AUC。

我已经在GitHub(https://github.com/keras-team/keras/issues/12991)上问了这个问题,但问题尚未得到解决,因此我认为这里的平台可能是更好的地方!

问题描述

在对自定义回调中的验证集进行一些计算期间,我偶然发现或多或少地发现,加权精度始终与我使用 sklearn.metrics.accuracy_score()的结果不同

使用Keras,加权精度必须在 model.compile()中声明,并且是每个纪元后在logs {}字典中的键(并且CSVLogger也将其写入日志文件)回调或历史记录对象),或者由 model.evaluate()

作为列表中的值返回
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'], 
              weighted_metrics=['accuracy'])

我使用Sklearn.metrics函数 class_weight.compute_sample_weight() class_weight.compute_class_weight()根据训练集的类贡献来计算val_sample_weights向量。 em>。

cls_weights = class_weight.compute_class_weight('balanced', np.unique(y_train._values), 
                                                y_train._values)
cls_weight_dict = {0: cls_weights[0], 1: cls_weights[1]}
val_sample_weights = class_weight.compute_sample_weight(cls_weight_dict, y_test._values)

model.fit()中,我将此向量与验证数据一起传递给 sklearn.metrics.accuracy_score(),将其传递给参数名称 sample_weight 可以在相同的基础上比较结果。

model_output = model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=1,
                         validation_data=(x_test, y_test, val_sample_weights))

此外,我从几个简单的示例中得出了Scitkit-learn如何计算加权准确度的方程式,似乎它是由以下方程式计算的(对我来说似乎很合理):

LaTeX equation

TP,TN,FP和FN是混淆矩阵中报告的值,而w_p和w_n分别是正类别和负类别的类别权重。

可以在这里找到一个简单的示例进行测试:

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html

出于完整性考虑, sklearn.metrics.accuracy_score(...,sample_weight =)返回与 sklearn.metrics.balanced_accuracy_score()相同的结果。

系统信息

  • GeForce RTX 2080 Ti
  • Keras 2.2.4
  • Tensorflow-gpu 1.13.1
  • Sklearn 0.19.2
  • Python 3.6.8
  • CUDA版本10.0.130

代码示例

我搜索了一个简单的示例,以使问题更容易重现,即使此处的班级失衡较弱(1:2而非1:10)。它基于Keras的入门教程,可以在这里找到:

https://towardsdatascience.com/k-as-in-keras-simple-classification-model-a9d2d23d5b5a

如上链接中所述, Pima Indianas发病糖尿病数据集将从主页Machine Learning Mastery的制造者Jason Brownlee的存储库中下载。但是我想它也可以从其他各个站点下载。

所以最后是代码:

from keras.layers import Dense, Dropout
from keras.models import Sequential
from keras.regularizers import l2
import pandas as pd
import numpy as np
from sklearn.utils import class_weight
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

file = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/' \
       'pima-indians-diabetes.data.csv'

# Load csv data from file to data using pandas
data = pd.read_csv(file, names=['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin',
                                'bmi', 'dpf', 'age', 'diabetes'])

# Process data
data.head()
x = data.drop(columns=['diabetes'])
y = data['diabetes']

# Split into train and test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)

# define a sequential model
model = Sequential()
# 1st hidden layer
model.add(Dense(100, activation='relu', input_dim=8, kernel_regularizer=l2(0.01)))
model.add(Dropout(0.3))
# 2nd hidden layer
model.add(Dense(100, activation='relu', kernel_regularizer=l2(0.01)))
model.add(Dropout(0.3))
# Output layer
model.add(Dense(1, activation='sigmoid'))
# Compilation with weighted metrics
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'], 
                         weighted_metrics=['accuracy'])

# Calculate validation _sample_weights_ based on the class distribution of train labels and 
# apply it to test labels using Sklearn
cls_weights = class_weight.compute_class_weight('balanced', np.unique(y_train._values), 
                                                y_train._values)
cls_weight_dict = {0: cls_weights[0], 1: cls_weights[1]}
val_sample_weights = class_weight.compute_sample_weight(cls_weight_dict, y_test._values)

# Train model
model_output = model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=1,
                         validation_data=(x_test, y_test, val_sample_weights))

# Predict model
y_pred = model.predict(x_test, batch_size=32, verbose=1)

# Classify predictions based on threshold at 0.5
y_pred_binary = (y_pred > 0.5) * 1

# Sklearn metrics
sklearn_accuracy = accuracy_score(y_test, y_pred_binary)
sklearn_weighted_accuracy = accuracy_score(y_test, y_pred_binary, 
                                           sample_weight=val_sample_weights)

# metric_list has 3 entries: [0] val_loss weighted by val_sample_weights, [1] val_accuracy 
# [2] val_weighted_accuracy
metric_list = model.evaluate(x_test, y_test, batch_size=32, verbose=1, 
                             sample_weight=val_sample_weights)

print('sklearn_accuracy=%.3f' %sklearn_accuracy)
print('sklearn_weighted_accuracy=%.3f' %sklearn_weighted_accuracy)
print('keras_evaluate_accuracy=%.3f' %metric_list[1])
print('keras_evaluate_weighted_accuracy=%.3f' %metric_list[2])

结果和摘要

例如我得到:

sklearn_accuracy=0.792

sklearn_weighted_accuracy=0.718

keras_evaluate_accuracy=0.792

keras_evaluate_weighted_accuracy=0.712

对于Sklearn和Keras,“未加权”精度值相同。差异并不是很大,但是随着数据集变得更加不平衡,差异会越来越大。例如,对于我的任务,它总是彼此相差约5%!

也许我错过了某些东西,应该是那样的,但是无论如何,Keras和Sklearn提供了不同的值,这尤其令人困惑,尤其是将整个class_weights和sample_weights视为难以理解的话题。不幸的是,我对Keras的了解并不深,无法自己搜索Keras代码。

我非常感谢收到任何答复!

1 个答案:

答案 0 :(得分:0)

我重复了您的确切玩具示例,实际上发现sklearnkeras确实给出了相同的结果。我重复了5次实验,以确保并非偶然,并且每次的结果都是相同的。对于其中一种运行,例如:

sklearn_accuracy=0.831
sklearn_weighted_accuracy=0.800
keras_evaluate_accuracy=0.831
keras_evaluate_weighted_accuracy=0.800

仅供参考,我正在使用sklearnkeras版本:

0.20.3
2.3.1

分别。请参阅以下Google colab示例:https://colab.research.google.com/drive/1b5pqbp9TXfKiY0ucEIngvz6_Tc4mo_QX