我将Scikit-learn API用于XGBoost(在python中)。我的准确度是〜68%。我使用了相同的参数集,并为XGBoost使用了Learning API。我的准确度约为60%。我的理解是Scikit-learn API是Learning API的包装,因此它们应该给我相同的结果。我不明白为什么我从这两个API中得到不同的结果。
cores=16
random_state=0
params = {
'n_estimators': 100,
'learning_rate': 0.1,
'max_depth': 3,
'min_child_weight': 1.0,
'subsample': 1.0,
'gamma': 0.0,
'tree_method':'gpu_exact',
'colsample_bytree': 1.0,
'alpha' : 0.0,
'lambda': 1.0,
'nthread': cores,
'objective': 'binary:logistic',
'booster': 'gbtree',
'seed': random_state,
'eta':0.1,
'silent': 1
}
model = XGBClassifier(**params)
r = model.fit(X_train,y_train)
print(model)
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
结果:
XGBClassifier(alpha=0.0, base_score=0.5, booster='gbtree',
colsample_bylevel=1, colsample_bytree=1.0, eta=0.1, gamma=0.0,
lambda=1.0, learning_rate=0.1, max_delta_step=0, max_depth=3,
min_child_weight=1.0, missing=None, n_estimators=100, n_jobs=1,
nthread=16, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=1,
subsample=1.0, tree_method='gpu_exact')
准确度:68.32%
dtrain = xgb.DMatrix(X_train, label=y_train)
dvalid = xgb.DMatrix(X_test, label=y_test)
# fit model no training data
model = xgb.train(params=params,dtrain=dtrain)
# make predictions for test data
y_pred = model.predict(dvalid)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
结果:
准确度:60.25%
答案 0 :(得分:2)
我相信差异是因为您没有在standard xgboost API(xgb.train())中指定增强回合的次数。因此,它使用默认值10。
“ n_estimators”是sklearn专有的术语。
此外,与上面给出的评论相反,当在同一系统上多次运行该特定算法时,它是确定性的。