我正在研究多标签和多类分类框架,我想添加矩阵以进行多标签和多类准确性计算。
这是演示数据:
predicted_labels = [[1,0,0,0,1],[1,0,0,0,1],[1,0,0,0,1],[1,0,0,0,1],[1,0,0,0,1],[1,0,1,0,1]]
true_labels = [[1,1,0,0,1],[1,0,0,1,1],[1,0,0,0,1],[1,1,1,0,1],[1,0,0,0,1],[1,0,0,0,1]]
用于多标签,多类别分类的最受欢迎的准确性矩阵是:
以上三个代码是:
def hamming_score(y_true, y_pred, normalize=True, sample_weight=None):
'''
Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case
'''
acc_list = []
for i in range(y_true.shape[0]):
set_true = set( np.where(y_true[i])[0] )
set_pred = set( np.where(y_pred[i])[0] )
#print('\nset_true: {0}'.format(set_true))
#print('set_pred: {0}'.format(set_pred))
tmp_a = None
if len(set_true) == 0 and len(set_pred) == 0:
tmp_a = 1
else:
tmp_a = len(set_true.intersection(set_pred))/\
float( len(set_true.union(set_pred)) )
#print('tmp_a: {0}'.format(tmp_a))
acc_list.append(tmp_a)
return { 'hamming_score' : np.mean(acc_list) ,
'subset_accuracy' : sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None),
'hamming_loss' : sklearn.metrics.hamming_loss(y_true, y_pred)}
但是我一直在寻找f1-score用于多标签分类,所以我尝试使用sklearn f1-score:
print(f1_score(demo, true, average='micro'))
但是它给了我错误:
> ValueError: multiclass-multioutput is not supported
我将数据转换为np数组,然后再次使用f1_score:
print(f1_score(np.array(true_labels),np.array(predicted_labels), average='micro'))
然后我得到了准确性:
0.8275862068965517
我又尝试了一个实验,我使用了真实和预测标签中的一个示例,并在其中使用了f1分数,然后取其平均值:
accuracy_score = []
for tru,pred in zip (true_labels, predicted_labels):
accuracy_score.append(f1_score(tru,pred,average='micro'))
print(np.mean(accuracy_score))
输出:
0.8333333333333335
精度不同
为什么它不在列表列表上工作,而是在np数组上工作,哪种方法是正确的,一个接一个地举例说明均值或对所有样本使用numpy数组?
还有哪些其他矩阵可用于多标签分类准确性计算?