下面是一个示例,该示例使用scikit-learn从 k 个最近的邻居获得交叉验证的预测,并通过交叉验证选择了 k 。该代码似乎有效,但是如何打印每个外部折痕中选择的 k ?
import numpy as np, sklearn
n = 100
X = np.random.randn(n, 2)
y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red")
preds = sklearn.model_selection.cross_val_predict(
X = X,
y = y,
estimator = sklearn.model_selection.GridSearchCV(
estimator = sklearn.neighbors.KNeighborsClassifier(),
param_grid = {'n_neighbors': range(1, 7)},
cv = sklearn.model_selection.KFold(10, random_state = 133),
scoring = 'accuracy'),
cv = sklearn.model_selection.KFold(10, random_state = 144))
答案 0 :(得分:1)
您无法直接从该函数获得此信息,因此您需要将cross_val_predict
替换为cross_validate
,并将return_estimator
标志设置为True
。然后,您可以使用键estimator
选择在返回的字典中使用的估计量。估计器的选定参数存储在属性best_params_
中。所以
import numpy as np
import sklearn
# sklearn 0.20.3 doesn't seem to import submodules in __init__
# So importing them directly is required.
import sklearn.model_selection
import sklearn.neighbors
n = 100
X = np.random.randn(n, 2)
y = np.where(np.sum(X, axis = 1) + np.random.randn(n) > 0, "blue", "red")
scores = sklearn.model_selection.cross_validate(
X = X,
y = y,
estimator = sklearn.model_selection.GridSearchCV(
estimator = sklearn.neighbors.KNeighborsClassifier(),
param_grid = {'n_neighbors': range(1, 7)},
cv = sklearn.model_selection.KFold(10, random_state = 133),
scoring = 'accuracy'),
cv = sklearn.model_selection.KFold(10, random_state = 144),
return_estimator=True)
# Selected hyper-parameters for the estimator from the first fold
print(scores['estimator'][0].best_params_)
不幸的是,您无法获得实际的预测以及从同一函数中选择的超参数。如果需要,您将必须手动执行嵌套的交叉验证:
cv = sklearn.model_selection.KFold(10, random_state = 144)
estimator = sklearn.model_selection.GridSearchCV(
estimator = sklearn.neighbors.KNeighborsClassifier(),
param_grid = {'n_neighbors': range(1, 7)},
cv = sklearn.model_selection.KFold(10, random_state = 133),
scoring = 'accuracy')
for train, test in cv.split(X,y):
X_train, y_train = X[train], y[train]
X_test, y_test = X[test], y[test]
m = estimator.fit(X_train, y_train)
print(m.best_params_)
y_pred = m.predict(X_test)
print(y_pred)