我使用下面的内容来获取估算器的输出。
是否有更快的方式来使用分数进行交叉验证?
for clfx, label in zip([clf0], ['Random Forest']):
scores = cross_validation.cross_val_score(clfx, X, y, cv=5, scoring='accuracy')
print "Accuracy : %0.3f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label)
scores = cross_validation.cross_val_score(clfx, X, y, cv=5, scoring='precision')
print "Precision: %0.3f (+/- %0.2f) [%s] " % (scores.mean(), scores.std(), label)
scores = cross_validation.cross_val_score(clfx, X, y, cv=5, scoring='recall')
print "Recall : %0.3f (+/- %0.2f) [%s] \n" % (scores.mean(), scores.std(), label)
输出:
Accuracy : 0.82 (+/- 0.00) [Random Forest]
Precision: 0.50 (+/- 0.02) [Random Forest]
Recall : 0.13 (+/- 0.01) [Random Forest]
这是否过度杀伤,我应该在一次运行中使用混淆矩阵?
答案 0 :(得分:1)
不幸的是,如果你想要结合指标,我认为你必须手动"运行交叉验证迭代:
from sklearn.metrics import precision_score, accuracy_score, recall_score
from sklearn.cross_validation import KFold
all_scores = {'precision':[], 'recall':[], 'accuracy': []}
for train, test in KFold(n = len(X)):
clfx.fit(X[train, :],y[train])
y_pred = clfx.predict(X[test])
all_scores['precision'] += precision_score(y_pred, y[test])
all_scores['accuracy'] += accuracy_score(y_pred, y[test])
all_scores['recall'] += recall_score(y_pred, y[test])
scores = all_scores['accuracy']
print ("Accuracy : %0.3f (+/- %0.2f) [%s]" % (np.mean(scores), np.std(scores), label))
scores = all_scores['precision']
print ("Precision: %0.3f (+/- %0.2f) [%s] " % (np.mean(scores), np.std(scores), label))
scores = all_scores['recall']
print ("Recall : %0.3f (+/- %0.2f) [%s] \n" % (np.mean(scores), np.std(scores), label))
如果您还想要,也可以使用multiprocess
来并行化(这是使用scikit-learn交叉验证功能的主要优势之一):
from multiprocessing import Pool
def score(cv_split, clfx=clfx, X=X, y=y):
train, test = cv_split
clfx.fit(X[train, :],y[train])
y_pred = clfx.predict(X[test])
all_scores = {}
all_scores['precision'] = precision_score(y_pred, y[test])
all_scores['accuracy'] = accuracy_score(y_pred, y[test])
all_scores['recall'] = recall_score(y_pred, y[test])
return all_scores
p = Pool(6)
scores_by_run = p.map(score, KFold(len(X)))
all_scores = {k:[d[k] for d in scores_by_run] for k in scores_by_run[0].keys()}