为什么在SVM的平衡数据集中具有高AUC和低精度

时间:2018-01-04 11:48:20

标签: svm roc auc

我使用LIBSVM对256个类进行分类。我的数据集大约是5000-10000。对于SVM,我使用一个策略来训练我的模型。现在,我得到低精度(15%~30%)但高AUC(> 90%)的结果。如果相应的预测模型的Acc低(13-30%),我认为不能获得高AUC(0.9和更高)? 我指的是开源Python库(scikit-learn)来计算多种问题的AUC。 (http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py) 这用于计算AUC:

# compute ROC curve and ROC area for each class
fpr     = dict()
tpr     = dict()
roc_auc = dict()

# test_label_kernel: the true label of one insance
# LensOfLabel      : the number of all classes
y       = label_binarize( test_label_kernel, classes = list(range(0,LensOfLabel,1)) )
#sort_pval: the prediction probability of SVM
for i in range(LensOfLabel):
    fpr[i], tpr[i], _ = metrics.roc_curve( y[:,i], sort_pval[:,i] )
    roc_auc[i]        = metrics.auc( fpr[i], tpr[i] ) 

# First aggregate all false positive rates
n_classes = LensOfLabel
all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
    mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC
mean_tpr /= n_classes

fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["macro"] = metrics.auc(fpr["macro"], tpr["macro"])
print( ("macroAUC: %.4f") %roc_auc["macro"] )

#compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = metrics.roc_curve( y.ravel(), sort_pval.ravel() )
roc_auc["micro"]              = metrics.auc( fpr["micro"], tpr["micro"] )
print( ("microAUC: %.4f") %roc_auc["micro"] )

ROC曲线是;

https://i.stack.imgur.com/GEUqr.png

https://i.stack.imgur.com/ucbE6.png

0 个答案:

没有答案