有关python中XGBoost中的Forecast_proba函数的问题

时间:2019-05-03 07:50:20

标签: python scikit-learn xgboost

目前,我正在研究二进制分类问题。 我希望我的预测输出是概率,而不是使用XGBoost的1或0。

我将数据集分为训练,验证和测试集。

global label, id_column, features
label = 'is_default'
id_column = 'emp_id'
features = ['age', 'income', 'dependent','A','B','C']

train, valid, test = np.split(df.sample(frac=1), [int(.8*len(df)), int(.95*len(df))])

X_train, y_train = train[features], train[label]
X_valid, y_valid = valid[features], valid[label]
X_test, y_test = test[features], test[label]

params = {
 'num_class' : 2,
 'learning_rate' : 0.1,
 'n_estimators':5,
 'max_depth':5,
 'min_child_weight':1,
 'gamma':2,
 'subsample':0.8,
 'colsample_bytree':0.5,
 'objective':'multi:softprob',
 'scale_pos_weight':2.14,
 'nthread':4,
 'seed':27}

# fit model 
model = XGBClassifier(**params)
model.fit(X_train, y_train)

valid_pred = model.predict_proba(X_test)

print(valid_pred) 

#My output looks like - 
#
#array([[0.39044815, 0.6095518 ],
#       [0.4008397 , 0.59916025],
#       [0.40074524, 0.5992548 ],
#       ...,
#       [0.3613969 , 0.6386031 ],
#       [0.45495912, 0.5450409 ],
#       [0.41036654, 0.58963346]], dtype=float32)
#
#It's give me the 1 or 0 value which I don't want. I want only the max probability. Like 0.6095518,0.59916025...etc.How to do this things?

best_valid_preds = [np.argmax(x) for x in valid_pred]
print(best_valid_preds)

1 个答案:

答案 0 :(得分:1)

因为您只需要最大概率。像0.6095518,0.59916025 ...等等。

您可以使用以下代码,

best_valid_preds = [np.max(x) for x in valid_pred]

有关玩具样本,请参见下文

preds = np.random.rand(100, 2)

best = [np.max(x) for x in preds]

print(best) # [0.9935469310532575,
 0.7121431432601246,
 0.5863137762128169,
 0.6562235545646353,
 0.7955074578808067,