Question

我已经通过使用xgboost.train（）获得xgboost来生成良好的预测。

X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=.6)
xgtrain = xgb.DMatrix(X_train, y_train)

param = {'max_depth':7, 'silent':1}
bst = xgb.train(param, xgtrain, num_boost_round=2)
y_pred = bst.predict(xgtest)
y_pred = [1. if y_cont > .28  else 0. for y_cont in y_pred]
y_true = y_test

这种方法没有产生好的结果（我试图最大化f1得分），直到我意识到在输出上设置阈值时f1得分显着增加。这个门槛原来是.28。以下是我设置截止值并转换为0和1之前的一些预测：

[ 0.25447303  0.25383738  0.24621713 ...,  0.24621713  0.24621713 0.24621713]

但现在我想调整我的参数（使用GridSearchCV（）），这意味着我需要使用XGBClassifier（）重现我在xgboost.train（）中所做的操作。

我意识到事情可能会变得棘手，因为xgboost.train（）中的（默认）目标函数是none，而对于XGBClassifier（），它是'binary：logistic'。 XGBClassifier（）返回类而不是概率，这在大多数情况下都很有用，但在这里却没有。我用XGBClassifier（）尝试了predict_proba（）然后设置了一个截止值，但是因为我得到的概率非常接近0和1所以它似乎很无用：

[[  9.99445975e-01   5.54045662e-04]
 [  9.89062011e-01   1.09380139e-02]
 [  9.95234787e-01   4.76523908e-03]

如何完成下面的代码，相当于xgboost.train（），但使用XGBClassifier？当我在没有截止的情况下尝试过XGBClassifier时，我会得到一个可怕的f1分数。

X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=.6)
rf = XGBClassifier(max_depth=7, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', nthread=-1, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, seed=0, missing=None)
rf = rf.fit(X_train, y_train)

使用XGBClassifier（）在xgboost.train（）中重现截止

0 个答案: