改进Python SKLearn CrossValidation输出

时间:2016-05-19 21:49:27

标签: python scikit-learn

我使用下面的内容来获取估算器的输出。

是否有更快的方式来使用分数进行交叉验证?

for clfx, label in zip([clf0], ['Random Forest']):
        scores = cross_validation.cross_val_score(clfx, X, y, cv=5, scoring='accuracy')
        print "Accuracy : %0.3f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label) 
        scores = cross_validation.cross_val_score(clfx, X, y, cv=5, scoring='precision')
        print "Precision: %0.3f (+/- %0.2f) [%s] " % (scores.mean(), scores.std(), label) 
        scores = cross_validation.cross_val_score(clfx, X, y, cv=5, scoring='recall')
        print "Recall   : %0.3f (+/- %0.2f) [%s] \n" % (scores.mean(), scores.std(), label) 

输出:

Accuracy : 0.82 (+/- 0.00) [Random Forest]
Precision: 0.50 (+/- 0.02) [Random Forest] 
Recall   : 0.13 (+/- 0.01) [Random Forest] 

这是否过度杀伤,我应该在一次运行中使用混淆矩阵?

1 个答案:

答案 0 :(得分:1)

不幸的是,如果你想要结合指标,我认为你必须手动"运行交叉验证迭代:

from sklearn.metrics import precision_score, accuracy_score, recall_score
from sklearn.cross_validation import KFold

all_scores = {'precision':[], 'recall':[], 'accuracy': []}
for train, test in KFold(n = len(X)):
    clfx.fit(X[train, :],y[train])
    y_pred = clfx.predict(X[test])
    all_scores['precision'] += precision_score(y_pred, y[test])
    all_scores['accuracy'] += accuracy_score(y_pred, y[test])
    all_scores['recall'] += recall_score(y_pred, y[test])

scores = all_scores['accuracy']
print ("Accuracy : %0.3f (+/- %0.2f) [%s]" % (np.mean(scores), np.std(scores), label))
scores = all_scores['precision']
print ("Precision: %0.3f (+/- %0.2f) [%s] " % (np.mean(scores), np.std(scores), label)) 
scores = all_scores['recall']
print ("Recall   : %0.3f (+/- %0.2f) [%s] \n" % (np.mean(scores), np.std(scores), label))

如果您还想要,也可以使用multiprocess来并行化(这是使用scikit-learn交叉验证功能的主要优势之一):

from multiprocessing import Pool

def score(cv_split, clfx=clfx, X=X, y=y):
    train, test = cv_split
    clfx.fit(X[train, :],y[train])
    y_pred = clfx.predict(X[test])
    all_scores = {}
    all_scores['precision'] = precision_score(y_pred, y[test])
    all_scores['accuracy'] = accuracy_score(y_pred, y[test])
    all_scores['recall'] = recall_score(y_pred, y[test])
    return all_scores

p = Pool(6)
scores_by_run = p.map(score, KFold(len(X)))
all_scores = {k:[d[k] for d in scores_by_run] for k in scores_by_run[0].keys()}