Question

我正尝试将sklearn的CalibratedClassifierCV（）与lightgbm一起使用，如下所示：

clf = LGBMClassifier(
boosting_type= 'gbdt',
objective= 'multiclass',
num_class=5,
metric= 'multi_logloss',
learning_rate= 0.05,
max_depth= 7,
num_leaves= 60,
feature_fraction= 0.7,
bagging_fraction= 1,
bagging_freq= 20,
nthread=4,
n_estimators=50)

calibrated_clf = CalibratedClassifierCV(clf, method='isotonic', cv=5)
calibrated_clf.fit(train_df[v1], train_df['label'])

但是，由于某些分类变量，lightgbm可以有效处理这些分类变量，而不必转换为一种热编码，但是CalibratedClassifierCV却可以使我出错。

错误：ValueError: could not convert string to float: 'RC'

它正在调用sklearn的validation.py，因为在字符串对象上调用astype float会导致步骤出错：

if dtype_numeric and array.dtype.kind == "O":
    array = array.astype(np.float64)

有没有一种方法可以解决这个问题，而无需转换为数字变量？我避免将类别转换为数字，因为有超过10000个类别

使用CalibratedClassifierCV对lightgbm进行概率校准

0 个答案: