Question

我试图通过scikit-learn交叉验证我的分数，并且我遇到了一个奇怪的问题，其中＆＃34;手动＆＃34;创建一个Stratified Shuffle Loop比使用内置的cross_val_score要好得多。

import pandas as pd
import numpy as np
import cPickle

import helper_functions

from sklearn.ensemble import RandomForestRegressor
from sklearn.cross_validation import StratifiedShuffleSplit

from sklearn.cross_validation import cross_val_score
from sklearn.metrics import make_scorer

rf_clf = RandomForestRegressor(n_estimators=5)

with open("../../stashed_dims.pkl", 'rb') as fout:
    [TRAIN_X, TRAIN_Y, TEST_X, test_index] = cPickle.load(fout)


N_CV = 1
sss = StratifiedShuffleSplit(TRAIN_Y, N_CV, test_size=0.25, random_state=0)

for iterations, [local_train_index, local_test_index] in enumerate(sss):
    X_train, X_test = TRAIN_X[local_train_index], TRAIN_X[local_test_index]
    y_train, y_test = TRAIN_Y[local_train_index], TRAIN_Y[local_test_index]

    rf_clf.fit(X_train, y_train)
    pred = rf_clf.predict(X_test)

    print("Stratified Shuffle Split method 1")
    print(helper_functions.get_score(pred, y_test))

scorer = make_scorer(helper_functions.get_score)
scores = cross_val_score(rf_clf, TRAIN_X, TRAIN_Y, cv = sss, scoring = scorer, verbose = 10)
print("Stratified Shuffle Split method 2")
print(scores)

我不知道这两种方法之间的区别是什么。有什么想法吗？

StratifiedShuffleSplit documentation
cross_val_score documentation

Answer 1

如果没有完整的代码（没有给出）很难说，但是，至少从这段代码中看，你似乎没有使用相同的评分函数。

明确的：

print(helper_functions.get_score(pred, y_test))

隐式：

scores = cross_val_score(... scoring = scorer ...)

Answer 2

在这个顺序中找到我的答案对我的评分功能很重要。

foo（y_true，y_pred）！= foo（y_pred，y_true）这个得分函数。

为什么Cross_Val_Score与Stratified Shuffle Split差异很大？

2 个答案: