Question

可以估算LOOCV中每个分裂的混淆矩阵。我在sklearn非常海军，我在LeaveOneOut阅读有关sklearn.model_selection的文档，我很清楚它是什么意思LOOC并且数据被分割了。但我想知道是否有办法在LOOC方法执行的每次拆分中表示混淆矩阵。我正在尝试的是与Kfold相关的内容，因为它在文档中有所体现，但我得到的内容对我来说非常奇怪。这就是我测试的内容：

iris = load_iris()
X = iris.data
y = iris.target
######################### LOOCV ##############################
clf = LinearDiscriminantAnalysis()
from sklearn.model_selection import LeaveOneOut
loo = LeaveOneOut()
print(loo.get_n_splits(X))
#150 scores (all either 1 or 0 why?)
print(cross_val_score(clf, X, y, cv=loo, n_jobs=-1))
y_pred = cross_val_predict(clf,X,y,cv=loo)
#I am not sure about this line. As far as I am concerned should be the confusion matrix with LOOCV but without the data splitted?
print('Confusion matrix after LOOCV without splitting the data: \n{}'.format(confusion_matrix(y,y_pred))
for train, test in loo.split(X):
    y_pred_prob = clf.fit(X[train], y[train]).predict_proba(X[test])
    y_pred_class = clf.predict(X[test])
    #confusion matrix for each split carried out by LOOCV
    conf_mat = metrics.confusion_matrix(y[test], y_pred_class)
    print('Confusion matrix: \n{}'.format(conf_mat))

通过这样做我得到的是近似150个混淆矩阵，如[[1]]。不应该是3x3矩阵吗？任何帮助或建议都会受到欢迎，提前谢谢！

Answer 1

您可以通过在混淆矩阵中使用labels参数提供数据类来实现此目的。

由于您使用的是LeaveOneOut，因此每次只使用一个样本进行测试，因此只存在一个预测。在这种情况下使用混淆矩阵是没有意义的。但是如果你想要，你仍然可以使用以下代码：

for train, test in loo.split(X):
    y_pred_prob = clf.fit(X[train], y[train]).predict_proba(X[test])
    y_pred_class = clf.predict(X[test])
    #confusion matrix for each split carried out by LOOCV
    conf_mat = confusion_matrix(y[test], y_pred_class, labels=[0,1,2])
    print('Confusion matrix: \n{}'.format(conf_mat))

labels = [0,1,2]表示数据中存在的类。

估算LeaveoneOut sklearn中每个分割的混淆矩阵

1 个答案: