Question

我想知道，在sklearn中的GridSearchCV方法中实现的默认交叉验证与使用它的Kfold方法之间有什么区别，如下面的代码所示：

不使用Kfold：

clf = GridSearchCV(estimator=model, param_grid=parameters, cv=10, scoring='f1_macro')
clf = clf.fit(xOri, yOri)

与Kfold合作：

NUM_TRIALS = 5
    for i in range(NUM_TRIALS):
         cv = KFold(n_splits=10, shuffle=True, random_state=i)
         clf = GridSearchCV(estimator=model, param_grid=parameters, cv=cv, scoring='f1_macro')
         clf = clf.fit(xOri, yOri)

正如我从手册中所理解的那样，他们两人将数据分成10个部分，9个用于训练，1个用于验证，但在使用Kfold的示例中......它进行了5次采样过程（{{1每次数据混洗后再分成10个部分。我对吗？

Answer 1

Looks like you're right, ish.

Either KFold or StratifiedKFold are used by GridSearchCV depending if your model is for regression (KFold) or classification (then StratifiedKFold is used).

Since I don't know what your data is like I can't be sure what is being used in this situation.

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

But the code you have above will repeat the KFold validation 5 times with different random seeds.

Whether that will produe meaningfully different splits of the data? Not sure.

Gridsearch和Kfold中的默认CV之间有什么区别

1 个答案: