遍历分类索引pandas sklearn

时间:2020-01-29 18:13:36

标签: python scikit-learn text-classification multiclass-classification

cm = np.array(confusion_matrix(y_test, pred, labels=[0,1]))

b=str(df["topic"].factorize()[1][0])
print(b)
print(df["topic"].factorize()[1].categories.to_numpy)

np=df["topic"].factorize()[1].categories.to_numpy



from IPython.display import display
#print( df["topic"].factorize().CategoricalIndex)
#df["topic"].factorize()
for predicted in df["topic"].factorize()[1].categories:
   for actual in df["topic"].factorize()[1].categories:
      if predicted != actual and int(conf_mat[actual, predicted]) >= 10:

       display(df.loc[indices_test[(y_test == actual) & (pred == predicted)]][['topic', 'body_wakati']])
       print('')

我想使用混淆矩阵来检查分类中哪一个是错误的 我想遍历熊猫的分类索引,但不确定是不是数组?

enter image description here

这是我打印时的分类索引 print(df["topic"].factorize()[1])

CategoricalIndex(['computer_graphics', 'operating_systems',
                  'computer_security', 'application_service',
                  'computer_software', 'artificial_intelligence',
                  'search_engine', 'information_society'],
                 categories=['application_service', 'artificial_intelligence', 'computer_graphics', 'computer_security', 'computer_software', 'information_society', 'operating_systems', 'search_engine'], ordered=False, dtype='category')
---------------------------------------------------------------------------

并尝试进行迭代时出现错误

--> 385       if predicted != actual and int(conf_mat[actual, predicted]) >= 10:
    386 
    387        display(df.loc[indices_test[(y_test == actual) & (pred == predicted)]][['topic', 'body_wakati']])

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

0 个答案:

没有答案