我想计算f1_score
。
代码如下所示:
if __name__ == '__main__':
y_pred_df = pd.read_csv('file1.csv', skipinitialspace=True, sep='\t', header=None, dtype= str)
y_pred = y_pred_df.values
y_true_df = pd.read_csv('file2.csv', header=None, dtype= str)
y_true = y_true_df.values
test_score = accuracy_score(y_true[:,0], y_pred[:,0])
print("\n Accuracy score (Random Forest with 100 estimators) : {}%".format(round(test_score*100,2)))
print (y_true[:,0])
print (y_pred[:,0])
score_test = f1_score(y_true[:,0], y_pred[:,0],pos_label=list(set(y_true[:,0])),average = 'weighted')
print (score_test)
执行上述代码时,在计算f1_score
时出现以下错误:
Accuracy score (Random Forest with 100 estimators) : 61.62%
['4' '4' '4' '4' '4' '12' '12' '12' '12' '12' '12' '12' '12' '4' '4' '4'
'4' '4' '4' '4' '4' '4' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '4' '4' '4' '4' '4' '4' '4' '4' '4' '4' '4' '4' '4'
'4' '4' '4' '4' '4' '12' '12' '4' '4' '4' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '4' '4'
'4' '4' '4' '4']
['4' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '4' '12' '4' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12' '12'
'12' '12' '12' '12' '12' '12' '12' '12' '12']
Traceback (most recent call last):
File "<ipython-input-25-f80f0ca3aea2>", line 1, in <module>
runfile('C:/Anaconda3/envs/python27/Scripts/spade/examples/project/Fmeasure.py', wdir='C:/Anaconda3/envs/python27/Scripts/spade/examples/project')
File "C:\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Anaconda3/envs/python27/Scripts/spade/examples/project/Fmeasure.py", line 47, in <module>
score_test = f1_score(y_true[:,0], y_pred[:,0],pos_label=list(set(y_true[:,0])),average = 'binary')
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py", line 639, in f1_score
sample_weight=sample_weight)
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py", line 756, in fbeta_score
sample_weight=sample_weight)
File "C:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py", line 992, in precision_recall_fscore_support
assume_unique=True)])
File "C:\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 280, in hstack
return _nx.concatenate(arrs, 1)
ValueError: all the input arrays must have same number of dimensions
能否请您告诉我问题来源?
答案 0 :(得分:0)
pos_label
必须仅包含一个元素,您正在传递标签列表。
pos_label
旨在一次计算一个标签的f1得分,当您传递列表时它崩溃。如果要计算每个标签的f1,则应进行循环,在其中循环遍历标签集,如下所示:
for label in set(yt)
score_test = f1_score(yt_, yp_, pos_label=[label])
print( 'f1', label, score_test )
如果您想要的是f1分数的加权平均值,那么您不应该使用pos_label,
score_test = f1_score(yt_, yp_, average = 'weighted')
但是,在sklearn 0.20上,以下方法有效,但它会警告您
from sklearn.metrics import f1_score
if __name__ == '__main__':
yt_ = ['4', '4', '4', '4', '4', '12', '12', '12', '12', '12', '12', '12', '12', '4', '4', '4', '4', '4', '4', '4', '4', '4', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '12', '12', '4', '4', '4', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '4', '4', '4', '4', '4', '4']
yp_ = ['4', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '4', '12', '4', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12']
score_test = f1_score(yt_, yp_, pos_label=list(set(yt_)),average = 'weighted')
print (score_test)
警告:
UserWarning:请注意,在以下情况下会忽略pos_label(设置为['12','4']) 平均!='binary'(得到'weighted')。您可以使用labels = [pos_label] 指定一个肯定的类别。