Question

我已经使用XGBoost在R中开发了一个模型。关于类别变量，我的数据高度不平衡，即5％对95％。尽管如此，我还是在测试数据的R方面获得了AUC-ROC和AUC_Precision良好的结果。这些分别是90％和61％。现在，我将R代码转换为python，以便在生产环境中进行部署。那里没有R。我试图保持与在R中设置的相同的超参数，但结果却大不相同。 AUC在86％的范围内，但测试数据的精度约为28％。我正在使用XGBoost软件包，而不是sklearn的XGBClassifier。这样做的原因：我将许多因变量设置为单调。在XGBClassifier中，没有可以设置“ monotone_constraint”的选项。这是R中的代码片段：

xgb.model <- xgb.train(data =xgb.data.train, params= list(objective='binary:logistic',
                                                          eta=0.3, max.depth=6, subsample=0.8,
                                                          gamma=5,
                                                          monotone_constraints=c(0,-1,-1,-1,-1,0,1,1,1),
                                                          eval_metric='auc',eval_metric='aucpr',
                                                          verbose = 1),
                       watchlist=list(test=xgb.data.test), nrounds=400, max.delta.step=10)

以及来自Python的代码：

feature_monotones = [0,-1,-1,-1,-1,0,1,1,1]
eval_metric = ['auc','aucpr']
params = {objective='binary:logistic'
          'max_depth': 6,
          'eta': 0.3,
          #'silent': 1,
          #'nthread': 2,
          'subsample': 0.8,
          'gamma': 5,
          'seed': 1234,
          'eval_metric': eval_metric,
          'max_delta_step':10,
          'monotone_constraints': '(' + ','.join([str(m) for m in feature_monotones]) + ')'
         }

evallist  = [(dtrain, 'train'), (dtest, 'eval')]
evals_result = {}
bst = xgb.train(params, dtrain, num_boost_round = 1000, evals_result = evals_result, evals = evallist,  verbose_eval = True)

除了R中的nrounds = 400和python中的num_boost_round = 1000外，我一直保持超参数相似。在python中，R的“ nrounds”不匹配。但是我假设nrounds与num_boost_round类似。

有人知道为什么会有这么多不同的结果吗？

python中的XGBoost没有提供与R中类似的结果

0 个答案: