我运行了一个lightgbm的基本示例来测试max_bin如何影响模型:
require(lightgbm)
data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train
test <- agaricus.test
dtrain <- lgb.Dataset(data = train$data, label = train$label, free_raw_data = FALSE)
dtest <- lgb.Dataset(data = test$data, label = test$label, free_raw_data = FALSE)
valids <- list(train = dtrain, test = dtest)
set.seed(100)
bst <- lgb.train(data = dtrain,
num_leaves = 31,
learning_rate = 0.05,
nrounds = 20,
valids = valids,
nthread = 2,
max_bin = 32,
objective = "binary")
我尝试将max_bin设置为32和255,两个测试给出相同的输出:
[LightGBM] [Info] Number of positive: 3140, number of negative: 3373
[LightGBM] [Info] Total Bins 128
[LightGBM] [Info] Number of data: 6513, number of used features: 107
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1]: train's binary_logloss:0.644852 test's binary_logloss:0.644853
......
[20]: train's binary_logloss:0.204922 test's binary_logloss:0.204929
为什么max_bin对模型的训练没有影响?
答案 0 :(得分:0)
您需要在创建 max_bin
期间设置 Dataset
。创建 Dataset
时,会计算附加统计信息。我不知道 R 的实现细节,但在 Python 中你将它作为 params={"max_bin":32}
传递。
答案 1 :(得分:0)