Gridsearch和Kfold中的默认CV之间有什么区别

时间:2018-03-24 15:32:21

标签: python scikit-learn cross-validation

我想知道,在sklearn中的GridSearchCV方法中实现的默认交叉验证与使用它的Kfold方法之间有什么区别,如下面的代码所示:

不使用Kfold:

clf = GridSearchCV(estimator=model, param_grid=parameters, cv=10, scoring='f1_macro')
clf = clf.fit(xOri, yOri)

与Kfold合作:

NUM_TRIALS = 5
    for i in range(NUM_TRIALS):
         cv = KFold(n_splits=10, shuffle=True, random_state=i)
         clf = GridSearchCV(estimator=model, param_grid=parameters, cv=cv, scoring='f1_macro')
         clf = clf.fit(xOri, yOri)

正如我从手册中所理解的那样,他们两人将数据分成10个部分,9个用于训练,1个用于验证,但在使用Kfold的示例中......它进行了5次采样过程({{1每次数据混洗后再分成10个部分。我对吗?

1 个答案:

答案 0 :(得分:1)

Looks like you're right, ish.

Either KFold or StratifiedKFold are used by GridSearchCV depending if your model is for regression (KFold) or classification (then StratifiedKFold is used).

Since I don't know what your data is like I can't be sure what is being used in this situation.

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

But the code you have above will repeat the KFold validation 5 times with different random seeds.

Whether that will produe meaningfully different splits of the data? Not sure.