我想计算分类器的AUC,精度和准确度。 我正在监督学习:
这是我的工作代码。 此代码适用于二进制类,但不适用于多类。 请假设您有一个包含二进制类的数据框:
sample_features_dataframe = self._get_sample_features_dataframe()
labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe)
labeled_sample_features_dataframe, binary_class_series, multi_class_series = self._prepare_dataframe_for_learning(labeled_sample_features_dataframe)
k = 10
k_folds = StratifiedKFold(binary_class_series, k)
for train_indexes, test_indexes in k_folds:
train_set_dataframe = labeled_sample_features_dataframe.loc[train_indexes.tolist()]
test_set_dataframe = labeled_sample_features_dataframe.loc[test_indexes.tolist()]
train_class = binary_class_series[train_indexes]
test_class = binary_class_series[test_indexes]
selected_classifier = RandomForestClassifier(n_estimators=100)
selected_classifier.fit(train_set_dataframe, train_class)
predictions = selected_classifier.predict(test_set_dataframe)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
roc += roc_auc_score(test_class, predictions_proba[:,1])
accuracy += accuracy_score(test_class, predictions)
recall += recall_score(test_class, predictions)
precision += precision_score(test_class, predictions)
最后我将结果分成K当然是为了获得平均AUC,精度等。 这段代码工作正常。 但是,对于多类,我无法计算相同的内容:
train_class = multi_class_series[train_indexes]
test_class = multi_class_series[test_indexes]
selected_classifier = RandomForestClassifier(n_estimators=100)
selected_classifier.fit(train_set_dataframe, train_class)
predictions = selected_classifier.predict(test_set_dataframe)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
我发现对于多类我必须为平均值添加参数“加权”。
roc += roc_auc_score(test_class, predictions_proba[:,1], average="weighted")
我收到错误:引发ValueError(“不支持”{0}格式“.format(y_type))
ValueError:不支持多类格式
答案 0 :(得分:10)
average
的{{1}}选项仅针对多标签问题进行了定义。
您可以从scikit-learn文档中查看以下示例,以定义您自己的多类问题的微观或宏观平均得分:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#multiclass-settings
编辑:scikit-learn跟踪器存在一个问题,即为多类问题实施ROC AUC:https://github.com/scikit-learn/scikit-learn/issues/3298
答案 1 :(得分:6)
您不能将roc_auc
用作多类模型的单个摘要指标。如果需要,您可以计算每班roc_auc
,如
roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
selected_classifier.fit(train_set_dataframe, train_class == label)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
roc[label] += roc_auc_score(test_class, predictions_proba[:,1])
然而,使用sklearn.metrics.confusion_matrix
来评估多类模型的性能更为常见。
答案 2 :(得分:4)
正如这里所提到的,据我所知,目前还没有一种方法可以轻松地在sklearn中为多个类设置轻松计算roc auc。
但是,如果您熟悉classification_report
,您可能会喜欢这个简单的实现,它返回与classification_report
相同的输出作为pandas.DataFrame
,我个人发现它非常方便!:
import pandas as pd
import numpy as np
from scipy import interp
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import roc_curve, auc
from sklearn.preprocessing import LabelBinarizer
def class_report(y_true, y_pred, y_score=None, average='micro'):
if y_true.shape != y_pred.shape:
print("Error! y_true %s is not the same shape as y_pred %s" % (
y_true.shape,
y_pred.shape)
)
return
lb = LabelBinarizer()
if len(y_true.shape) == 1:
lb.fit(y_true)
#Value counts of predictions
labels, cnt = np.unique(
y_pred,
return_counts=True)
n_classes = len(labels)
pred_cnt = pd.Series(cnt, index=labels)
metrics_summary = precision_recall_fscore_support(
y_true=y_true,
y_pred=y_pred,
labels=labels)
avg = list(precision_recall_fscore_support(
y_true=y_true,
y_pred=y_pred,
average='weighted'))
metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
class_report_df = pd.DataFrame(
list(metrics_summary),
index=metrics_sum_index,
columns=labels)
support = class_report_df.loc['support']
total = support.sum()
class_report_df['avg / total'] = avg[:-1] + [total]
class_report_df = class_report_df.T
class_report_df['pred'] = pred_cnt
class_report_df['pred'].iloc[-1] = total
if not (y_score is None):
fpr = dict()
tpr = dict()
roc_auc = dict()
for label_it, label in enumerate(labels):
fpr[label], tpr[label], _ = roc_curve(
(y_true == label).astype(int),
y_score[:, label_it])
roc_auc[label] = auc(fpr[label], tpr[label])
if average == 'micro':
if n_classes <= 2:
fpr["avg / total"], tpr["avg / total"], _ = roc_curve(
lb.transform(y_true).ravel(),
y_score[:, 1].ravel())
else:
fpr["avg / total"], tpr["avg / total"], _ = roc_curve(
lb.transform(y_true).ravel(),
y_score.ravel())
roc_auc["avg / total"] = auc(
fpr["avg / total"],
tpr["avg / total"])
elif average == 'macro':
# First aggregate all false positive rates
all_fpr = np.unique(np.concatenate([
fpr[i] for i in labels]
))
# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in labels:
mean_tpr += interp(all_fpr, fpr[i], tpr[i])
# Finally average it and compute AUC
mean_tpr /= n_classes
fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["avg / total"] = auc(fpr["macro"], tpr["macro"])
class_report_df['AUC'] = pd.Series(roc_auc)
return class_report_df
以下是一些例子:
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=5000, n_features=10,
n_informative=5, n_redundant=0,
n_classes=10, random_state=0,
shuffle=False)
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = RandomForestClassifier(max_depth=2, random_state=0)
model.fit(X_train, y_train)
常规classification_report
:
sk_report = classification_report(
digits=6,
y_true=y_test,
y_pred=model.predict(X_test))
print(sk_report)
输出:
precision recall f1-score support
0 0.262774 0.553846 0.356436 130
1 0.405405 0.333333 0.365854 135
2 0.367347 0.150000 0.213018 120
3 0.350993 0.424000 0.384058 125
4 0.379310 0.447154 0.410448 123
5 0.525000 0.182609 0.270968 115
6 0.362573 0.488189 0.416107 127
7 0.330189 0.299145 0.313901 117
8 0.328571 0.407080 0.363636 113
9 0.571429 0.248276 0.346154 145
avg / total 0.390833 0.354400 0.345438 1250
自定义classification_report:
report_with_auc = class_report(
y_true=y_test,
y_pred=model.predict(X_test),
y_score=model.predict_proba(X_test))
print(report_with_auc)
输出:
precision recall f1-score support pred AUC
0 0.262774 0.553846 0.356436 130.0 274.0 0.766477
1 0.405405 0.333333 0.365854 135.0 111.0 0.773974
2 0.367347 0.150000 0.213018 120.0 49.0 0.817341
3 0.350993 0.424000 0.384058 125.0 151.0 0.803364
4 0.379310 0.447154 0.410448 123.0 145.0 0.802436
5 0.525000 0.182609 0.270968 115.0 40.0 0.680870
6 0.362573 0.488189 0.416107 127.0 171.0 0.855768
7 0.330189 0.299145 0.313901 117.0 106.0 0.766526
8 0.328571 0.407080 0.363636 113.0 140.0 0.754812
9 0.571429 0.248276 0.346154 145.0 63.0 0.769100
avg / total 0.390833 0.354400 0.345438 1250.0 1250.0 0.776071
答案 3 :(得分:3)
我需要做同样的事情(多类的roc_auc_score)。在first answer的最后一句之后,我搜索并发现sklearn确实在版本0.22.1中为多类提供了auc_roc_score。(我有以前的版本,更新到该版本后,我可以获得auc_roc_score多类功能,如下所示:在sklearn docs中提到)
MWE示例(一批等于16的示例):
final_preds = torch.softmax(preds,dim=1).squeeze(1)
num_classes = final_preds.shape[1]
print("y_true={}".format(y))
print("y_score={}".format(final_preds))
labels1 = np.arange(num_classes)
print("roc_auc_score={}".format(roc_auc_score(y.detach().cpu().numpy(),final_preds.detach().cpu().numpy(), average='macro', multi_class='ovo',labels=labels1)))
将产生:
y_true=tensor([5, 5, 4, 0, 6, 0, 4, 1, 0, 5, 0, 0, 5, 0, 1, 0])
y_score=tensor([[0.0578, 0.0697, 0.1135, 0.1264, 0.0956, 0.1534, 0.1391, 0.0828, 0.0725,
0.0891],
[0.0736, 0.0892, 0.1096, 0.1277, 0.0888, 0.1372, 0.1227, 0.0895, 0.0914,
0.0702],
[0.0568, 0.1571, 0.0339, 0.1200, 0.1069, 0.1800, 0.1285, 0.0486, 0.0961,
0.0720],
[0.1649, 0.0876, 0.1051, 0.0768, 0.0498, 0.0838, 0.0676, 0.0601, 0.1900,
0.1143],
[0.1096, 0.0797, 0.0580, 0.1190, 0.2201, 0.1036, 0.0550, 0.0514, 0.1018,
0.1018],
[0.1522, 0.1033, 0.1139, 0.0789, 0.0496, 0.0553, 0.0730, 0.1428, 0.1447,
0.0863],
[0.1416, 0.1304, 0.1184, 0.0775, 0.0683, 0.0657, 0.1521, 0.0426, 0.1342,
0.0693],
[0.0944, 0.0806, 0.0622, 0.0629, 0.0652, 0.0936, 0.0607, 0.1270, 0.2392,
0.1142],
[0.0848, 0.0966, 0.0923, 0.1301, 0.0932, 0.0910, 0.1066, 0.0877, 0.1297,
0.0880],
[0.1040, 0.1341, 0.0906, 0.0934, 0.0586, 0.0949, 0.0869, 0.1605, 0.0819,
0.0952],
[0.2882, 0.0716, 0.1136, 0.0235, 0.0022, 0.0170, 0.0891, 0.2371, 0.0533,
0.1044],
[0.2274, 0.1077, 0.1183, 0.0937, 0.0140, 0.0705, 0.1168, 0.0913, 0.1120,
0.0483],
[0.0846, 0.1281, 0.0772, 0.1088, 0.1333, 0.0831, 0.0444, 0.1553, 0.1285,
0.0568],
[0.0756, 0.0822, 0.1468, 0.1286, 0.0749, 0.0978, 0.0565, 0.1513, 0.0840,
0.1023],
[0.0521, 0.0555, 0.1031, 0.0816, 0.1145, 0.1090, 0.1095, 0.0846, 0.0919,
0.1982],
[0.0491, 0.1814, 0.0331, 0.0052, 0.0166, 0.0051, 0.0812, 0.0045, 0.5111,
0.1127]])
roc_auc_score=0.40178571428571425
要使其正常工作,我必须使预测得分软最大,以确保每个样本的得分的概率总和为1(batch_size中所有i的总和(sum(y_score [:,i])= 1)。第二个是传递labels1参数,以允许roc_auc的multi_class版本了解所有类的数量(在其他情况下,y_true应该具有所有可用的类(在大多数情况下不是这样)。
答案 4 :(得分:1)
如果您要查找相对简单的东西,该东西可以接收实际列表和预测列表,并返回以所有类为键并将roc_auc_score作为值的字典,则可以使用以下方法:
from sklearn.metrics import roc_auc_score
def roc_auc_score_multiclass(actual_class, pred_class, average = "macro"):
#creating a set of all the unique classes using the actual class list
unique_class = set(actual_class)
roc_auc_dict = {}
for per_class in unique_class:
#creating a list of all the classes except the current class
other_class = [x for x in unique_class if x != per_class]
#marking the current class as 1 and all other classes as 0
new_actual_class = [0 if x in other_class else 1 for x in actual_class]
new_pred_class = [0 if x in other_class else 1 for x in pred_class]
#using the sklearn metrics method to calculate the roc_auc_score
roc_auc = roc_auc_score(new_actual_class, new_pred_class, average = average)
roc_auc_dict[per_class] = roc_auc
return roc_auc_dict
print("\nLogistic Regression")
# assuming your already have a list of actual_class and predicted_class from the logistic regression classifier
lr_roc_auc_multiclass = roc_auc_score_multiclass(actual_class, predicted_class)
print(lr_roc_auc_multiclass)
# Sample output
# Logistic Regression
# {0: 0.5087457159427196, 1: 0.5, 2: 0.5, 3: 0.5114706737345112, 4: 0.5192307692307693}
# 0.5078894317816
答案 5 :(得分:1)
有许多指标可用于量化多类分类器的质量,包括 roc_auc_score。通过下面的链接了解更多信息。 https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
实际上,roc_auc 是为二元分类器计算的,尽管 roc_auc_score 实现了“onevsrest”或“onevsone”策略,以将多类分类问题分别转换为 N 或 个二元问题。 要仅计算 (AUC) 下的面积,请将 multi_class 参数设置为“ovr”或“ovo”。
<块引用>roc_auc_score(y_true, y_score, multi_class='ovr')
这里 y_score 可以是 clf.decision_function()
或 clf.predict_proba()
函数的输出。
但是,要绘制二元分类器的 ROC 曲线,首先实现 OneVsRestClassifier()
或 OneVsOneClassifier
,然后使用 clf.decision_function()
或 clf.predict_proba()
函数的输出绘制 roc_curve
或precision_recall_curve
取决于您的数据。访问 ogrisel
https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#multiclass-settings
答案 6 :(得分:0)
更新maxymoo的答案。
roc [label] + = roc_auc_score(test_class,projections_proba [:,label])
或参考classifier.classes_属性为感兴趣的标签确定合适的列。
答案 7 :(得分:0)
@Raul您的函数看起来不错,但是当它使用n_classes <= 2计算微观平均值的roc_score时,函数中存在问题。我在尺寸方面遇到问题,因此更改了以下内容:
从此
if average == 'micro':
if n_classes <= 2:
fpr["avg / total"], tpr["avg / total"], _ = roc_curve(
lb.transform(y_true).ravel(),
**y_score[:, 1]**.ravel())
对此
if average == 'micro':
if n_classes <= 2:
fpr["avg / total"], tpr["avg / total"], _ = roc_curve(
lb.transform(y_true).ravel(),
**y_score**.ravel())
我希望此更改不会在roc_score的计算中产生问题。