如何通过运行决策树算法找到我们获得的输出的整体准确性。我能够获得活动用户输入的前五个类标签,但我获得了X_train和Y_train数据集的准确性使用accuracy_score()。假设我得到五个最佳推荐。我希望得到每个类标签的准确性,并借助这些标准,输出的整体准确性。请提出一些想法。
我的python脚本在这里: 这里的事件是不同的类标签
DTC= DecisionTreeClassifier()
DTC.fit(X_train_one_hot,y_train)
print("output from DTC:")
res=DTC.predict_proba(X_test_one_hot)
new=list(chain.from_iterable(res))
#Here I got the index value of top five probabilities
index=sorted(range(len(new)), key=lambda i: new[i], reverse=True)[:5]
for i in index:
print(event[i])
Here is the sample code which i tried to get the accuracy for the predicted class labels:
here index is the index for the top five probability of class label and event is the different class label.
for i in index:
DTC.fit(X_train_one_hot,y_train)
y_pred=event[i]
AC=accuracy_score((event,y_pred)*100)
print(AC)
答案 0 :(得分:0)
由于您有多类分类问题,因此可以使用Python中的confusion_matrix
函数计算分类器的准确性。
要获得总体准确度,请将对角线中的值相加,并将总和除以样本总数。
使用IRIS数据集考虑以下简单的多类分类example:
import itertools
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
class_names = iris.target_names
# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Run classifier, using a model that is too regularized (C too low) to see
# the impact on the results
classifier = svm.SVC(kernel='linear', C=0.01)
y_pred = classifier.fit(X_train, y_train).predict(X_test)
现在要计算整体准确度,请使用混淆矩阵:
conf_mat = confusion_matrix(y_pred, y_test)
acc = np.sum(conf_mat.diagonal()) / np.sum(conf_mat)
print('Overall accuracy: {} %'.format(acc*100))