我在几个预测任务中应用CV,并希望我的每个参数集始终使用相同的折叠 - 如果可能的话,也可以在不同的python脚本中使用,因为性能实际上取决于折叠。 我正在使用skfarns KFold:
kf = KFold(n_splits=folds, shuffle=False, random_state=1986)
通过
构建我的折叠for idx_split, (train_index, test_index) in enumerate(kf.split(X, Y)):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = Y[train_index], Y[test_index]
并像
一样循环遍历它们for idx_alpha, alpha in enumerate([0, 0.2, 0.4, 0.6, 0.8, 1]):
# [...]
for idx_split, (train_index, test_index) in enumerate(kf.split(X, Y)):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = Y[train_index], Y[test_index]**
虽然我选择了random_state并设置了一个numpy种子,但折叠并不是一直都是。我可以做些什么来实现这一点,并可能通过几个python脚本分享我的折叠?
答案 0 :(得分:2)
你似乎正在重新发明GridSearchCV; - )
尝试这种方法:
from sklearn.model_selection import GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
param_grid = dict(model__alpha=[0, 0.2, 0.4, 0.6, 0.8, 1])
model = Lasso() # put here algorithm, that you want to use
folds = 3
# alternatively you can prepare folds yourself
#folds = KFold(n_splits=folds, shuffle=False, random_state=1986)
grid_search = GridSearchCV(model, param_grid=param_grid, cv=folds, n_jobs=-1, verbose=2)
grid_search.fit(X_train, y_train)
y_pred = grid_search.best_estimator_.predict(X_test)