XGBRegressor比GradientBoostingRegressor慢得多

时间:2016-12-16 04:26:27

标签: scikit-learn xgboost

我是var searchresponse = [{ "items": [{ "employeeId": "ABC", "type": "D", "alive": "Yes" }, { "employeeId": "DEF", "type": "D", "alive": "Yes" }, { "employeeId": "NPK", "type": "D", "alive": "Yes" }, { "employeeId": "PKN", "type": "A", "alive": "Yes" }], "more": false }]; var data1 = ["ABC", "NPK", "PKN"]; var data1 = ["ABC", "NPK"]; var items = searchresponse[0].items; for (var i = items.length - 1; i >= 0; i--) { if (data1.indexOf(items[i].employeeId) != -1) { items.splice(i, 1); } } console.log(searchresponse[0].items);的新手,我正在尝试通过将其与传统xgboost进行比较来学习如何使用它。但是,我注意到gbmxgboost慢得多。例子是:

gbm

在具有8个内核的Macbook Pro上,输出为:

from sklearn.model_selection import KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor
from sklearn.datasets import load_boston
import time

boston = load_boston()
X = boston.data
y = boston.target

kf = KFold(n_splits = 5)
cv_params = {'cv': kf, 'scoring': 'r2', 'n_jobs': 4, 'verbose': 1}

gbm = GradientBoostingRegressor()
xgb = XGBRegressor()

grid = {'n_estimators': [100, 300, 500], 'max_depth': [3, 5]}

timer = time.time()
gbm_cv = GridSearchCV(gbm, param_grid = grid, **cv_params).fit(X, y)
print('GBM time: ', time.time() - timer)

timer = time.time()
xgb_cv = GridSearchCV(xgb, param_grid = grid, **cv_params).fit(X, y)
print('XGB time: ', time.time() - timer)

我认为xgboost应该快得多,所以我一定做错了。有人可以帮助指出我做错了什么吗?

1 个答案:

答案 0 :(得分:1)

这是在我的机器上运行时的输出,未在n_jobs

中设置cv_params参数
Fitting 5 folds for each of 6 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    4.1s finished
('GBM time: ', 4.248916864395142)
Fitting 5 folds for each of 6 candidates, totalling 30 fits
('XGB time: ', 2.934467077255249)
[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    2.9s finished

n_jobs设置为4时,GBM的输出 2.5s ,但XGB需要很长时间。

所以这可能是n_jobs的问题!也许XGBoost库没有很好地配置为使用GridSearchCV运行n_jobs。