使用xgboost进行校准

时间:2016-02-23 19:00:17

标签: scikit-learn xgboost

我想知道我是否可以在xgboost中进行校准。更具体地说,xgboost是否带有像scikit-learn那样的现有校准实现,或者是否有一些方法可以将模型从xgboost放入scikit-learn的CalibratedClassifierCV?

据我所知,sklearn这是常见的程序:

RuntimeError: classifier has no decision_function or predict_proba method.

如果我将xgboost树模型放入CalibratedClassifierCV,则会抛出错误(当然):

DECLARE @startdate DATETIME = '2016-02-23'--your_starting_date DECLARE @enddate DATETIME = '2016-03-01' --your_ending_date ;WITH cte AS ( SELECT @startdate AS start_time , DATEADD(MINUTE, 30, @startdate) AS end_time UNION ALL SELECT DATEADD(MINUTE, 30, start_time) AS start_time , DATEADD(MINUTE, 30, end_time) AS end_TIME FROM cte WHERE end_time <= @enddate ) SELECT * INTO #time_table FROM CTE OPTION (MAXRECURSION 32727) GO SELECT start_time , end_time , SUM(CASE WHEN your_time_column BETWEEN start_time AND end_time THEN 1 ELSE 0 END) AS total_count FROM #time_table INNER JOIN your_table --left join if you want all time slots with 0 occurrences ON your_time_column BETWEEN start_time AND end_time GROUP BY start_time , end_time

有没有办法将scikit-learn的优秀校准模块与xgboost集成?

欣赏你富有洞察力的想法!

2 个答案:

答案 0 :(得分:7)

回答我自己的问题,xgboost GBT可以通过编写包装类与scikit-learn集成,如下例所示。

class XGBoostClassifier():
def __init__(self, num_boost_round=10, **params):
    self.clf = None
    self.num_boost_round = num_boost_round
    self.params = params
    self.params.update({'objective': 'multi:softprob'})

def fit(self, X, y, num_boost_round=None):
    num_boost_round = num_boost_round or self.num_boost_round
    self.label2num = dict((label, i) for i, label in enumerate(sorted(set(y))))
    dtrain = xgb.DMatrix(X, label=[self.label2num[label] for label in y])
    self.clf = xgb.train(params=self.params, dtrain=dtrain, num_boost_round=num_boost_round)

def predict(self, X):
    num2label = dict((i, label)for label, i in self.label2num.items())
    Y = self.predict_proba(X)
    y = np.argmax(Y, axis=1)
    return np.array([num2label[i] for i in y])

def predict_proba(self, X):
    dtest = xgb.DMatrix(X)
    return self.clf.predict(dtest)

def score(self, X, y):
    Y = self.predict_proba(X)
    return 1 / logloss(y, Y)

def get_params(self, deep=True):
    return self.params

def set_params(self, **params):
    if 'num_boost_round' in params:
        self.num_boost_round = params.pop('num_boost_round')
    if 'objective' in params:
        del params['objective']
    self.params.update(params)
    return self

请参阅完整示例here

请不要犹豫,提供更聪明的方法!

答案 1 :(得分:0)

从2020年7月开始的地狱笔记:

您不再需要包装器类。 xgboost sklearn python API中内置了predict_proba方法。不知道何时添加它们,但是肯定可以在v1.0.0上找到它们。

注意:当然,这仅适用于具有predict_proba方法的类。例如:XGBRegressor没有。 XGBClassifier可以。