我正在使用xgboost
和sklearn
参加Kaggle比赛(https://www.kaggle.com/c/house-prices-advanced-regression-techniques#evaluation)。
具体来说,我使用GridSearchCV
为我的XGBRegressor
模型尝试不同的超参数。这就是我正在做的事情:
import pandas as pd
import numpy as np
import sklearn as sk
import matplotlib as plt
import xgboost as xgb
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
pd.options.display.max_rows = 100
xgb_model = xgb.XGBRegressor()
params = {"max_depth": [3, 4], "learning_rate": [0.05],
"n_estimators": [1000, 2000], "n_jobs": [8], "subsample": [0.8], "random_state": [42]}
grid_search_cv = GridSearchCV(xgb_model, params, scoring="neg_mean_absolute_error",
n_jobs=8, cv=KFold(n_splits=10, shuffle=True, random_state=42), verbose=2)
grid_search_cv.fit(X, y)
所以我尝试了一些max_depth
,learning_rate
,n_estimators
等。
奇怪的是,调用上面的.fit()
会导致此输出:
GridSearchCV(cv=KFold(n_splits=10, random_state=42, shuffle=True),
error_score='raise',
estimator=XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=True, subsample=1),
fit_params=None, iid=True, n_jobs=8,
param_grid={'max_depth': [3, 4], 'learning_rate': [0.05], 'n_estimators': [1000, 2000], 'n_jobs': [8], 'subsample': [0.8], 'random_state': [42]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring='neg_mean_absolute_error', verbose=2)
所以GridSearchCV
输出自身,但使用超参数值的XGBRegressor
实例我没有指定?
看起来这些是默认值,但我无法在.fit()
文档中找到sklearn
(仅.fit_params
)...
任何指导都会很棒!