sklearn中K折交叉验证中每个折叠的预测值

时间:2018-08-07 04:15:09

标签: python scikit-learn regression cross-validation

我已经对使用python sklearn的数据集执行了10倍交叉验证,

result = cross_val_score(best_svr, X, y, cv=10, scoring='r2') print(result.mean())

我已经能够获得r2分数的平均值作为最终结果。我想知道是否有一种方法可以打印出每折的预测值(在这种情况下为10组值)。

3 个答案:

答案 0 :(得分:1)

我相信您正在寻找cross_val_predict函数。

答案 1 :(得分:1)

一个迟到的答案,只是添加到@jh314,cross_val_predict 确实返回了所有预测,但我们不知道每个预测属于哪个折叠。为此,我们需要提供折叠,而不是整数:

import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict, StratifiedKFold 

iris = sns.load_dataset('iris')
X=iris.iloc[:,:4]
y=(iris['species'] == "versicolor").astype('int')

rfc = RandomForestClassifier()
skf = StratifiedKFold(n_splits=10,random_state=111,shuffle=True)

pred = cross_val_predict(rfc, X, y, cv=skf)

现在我们遍历 Kfold 对象并提取与每个折叠对应的预测:

fold_pred = [pred[j] for i, j in skf.split(X,y)]
fold_pred

[array([0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0]),
 array([0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0]),
 array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1]),
 array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]),
 array([0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0]),
 array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0]),
 array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]),
 array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]),
 array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0]),
 array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0])]

答案 2 :(得分:0)

要打印各折的预测,

for k in range(2,10):
    result = cross_val_score(best_svr, X, y, cv=k, scoring='r2')
    print(k, result.mean())
    y_pred = cross_val_predict(best_svr, X, y, cv=k)
    print(y_pred)