打印嵌套交叉验证中的选定参数

时间:2019-04-14 16:02:03

标签: python scikit-learn

下面是一个示例,该示例使用scikit-learn从 k 个最近的邻居获得交叉验证的预测,并通过交叉验证选择了 k 。该代码似乎有效,但是如何打印每个外部折痕中选择的 k

import numpy as np, sklearn

n = 100
X = np.random.randn(n, 2)
y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red")

preds = sklearn.model_selection.cross_val_predict(
    X = X,
    y = y,
    estimator = sklearn.model_selection.GridSearchCV(
       estimator = sklearn.neighbors.KNeighborsClassifier(),
       param_grid = {'n_neighbors': range(1, 7)},
       cv = sklearn.model_selection.KFold(10, random_state = 133),
       scoring = 'accuracy'),
    cv = sklearn.model_selection.KFold(10, random_state = 144))

1 个答案:

答案 0 :(得分:1)

您无法直接从该函数获得此信息,因此您需要将cross_val_predict替换为cross_validate,并将return_estimator标志设置为True。然后,您可以使用键estimator选择在返回的字典中使用的估计量。估计器的选定参数存储在属性best_params_中。所以

import numpy as np
import sklearn
# sklearn 0.20.3 doesn't seem to import submodules in __init__
# So importing them directly is required.
import sklearn.model_selection
import sklearn.neighbors

n = 100
X = np.random.randn(n, 2)
y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red")

scores = sklearn.model_selection.cross_validate(
    X = X,
    y = y,
    estimator = sklearn.model_selection.GridSearchCV(
       estimator = sklearn.neighbors.KNeighborsClassifier(),
       param_grid = {'n_neighbors': range(1, 7)},
       cv = sklearn.model_selection.KFold(10, random_state = 133),
       scoring = 'accuracy'),
    cv = sklearn.model_selection.KFold(10, random_state = 144),
    return_estimator=True)

# Selected hyper-parameters for the estimator from the first fold
print(scores['estimator'][0].best_params_)

不幸的是,您无法获得实际的预测以及从同一函数中选择的超参数。如果需要,您将必须手动执行嵌套的交叉验证:

cv = sklearn.model_selection.KFold(10, random_state = 144)
estimator = sklearn.model_selection.GridSearchCV(
       estimator = sklearn.neighbors.KNeighborsClassifier(),
       param_grid = {'n_neighbors': range(1, 7)},
       cv = sklearn.model_selection.KFold(10, random_state = 133),
       scoring = 'accuracy')
for train, test in cv.split(X,y):
    X_train, y_train = X[train], y[train]
    X_test, y_test = X[test], y[test]
    m = estimator.fit(X_train, y_train)
    print(m.best_params_)
    y_pred = m.predict(X_test)
    print(y_pred)