LightGBM:Sklearn和Native API等价

时间:2017-10-31 14:51:16

标签: machine-learning scikit-learn lightgbm

我正在通过培训API http://lightgbm.readthedocs.io/en/latest/Python-API.html#training-api和Scikit-learn API http://lightgbm.readthedocs.io/en/latest/Python-API.html#scikit-learn-api试验LightGBM。

我无法在两个API之间进行明确的映射,如下面的示例所示。基本思想是训练50%的合成数据集。

import numpy as np
import lightgbm as lgbm

# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1)) 
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))

# LGBM configuration
alg_conf = {
    "num_boost_round":25,
    "max_depth" : 3,
    "num_leaves" : 31,
    'learning_rate' : 0.1,
    'boosting_type' : 'gbdt',
    'objective' : 'regression_l2',
    "early_stopping_rounds": None,
}

# Calling Regressor using scikit-learn API 
sk_reg = lgbm.sklearn.LGBMRegressor(
    num_leaves=alg_conf["num_leaves"], 
    n_estimators=alg_conf["num_boost_round"], 
    max_depth=alg_conf["max_depth"],
    learning_rate=alg_conf["learning_rate"],
    objective=alg_conf["objective"]
)
sk_reg.fit(xs[::2], ys[::2])

print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))


# Calling Regressor using native API 
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)

print("Native API results")
print(lg_reg.predict(xs[1::2]))

输出

Scikit-learn API results
[  14.35693851   14.35693851   14.35693851   14.35693851   14.35693851
   14.35693851   14.35693851   14.35693851   14.35693851   14.35693851
   25.37944751   25.37944751   25.37944751   25.37944751   25.37944751
   35.10572544   35.10572544   35.10572544   35.10572544   35.10572544
   46.50667974   46.50667974   46.50667974   46.50667974   46.50667974
   59.44952419   59.44952419   59.44952419   59.44952419   59.44952419
   75.42846332   75.42846332   75.42846332   75.42846332   75.42846332
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814
  109.4610814   109.4610814   109.4610814   109.4610814   109.4610814 ]
Native API results
[ 22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  22.55947971  22.55947971  22.55947971  22.55947971  22.55947971
  45.33537795  45.33537795  45.33537795  45.33537795  45.33537795
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959
  91.6376959   91.6376959   91.6376959   91.6376959   91.6376959 ]

问题

我在哪里可以找到两个API参数之间明确的等价?

非常感谢。

1 个答案:

答案 0 :(得分:2)

我在LightGBM GitHub上获得了答案。分享以下结果:

添加alg_conf "min_child_weight": 1e-3, "min_child_samples": 20)修复了差异:

import numpy as np
import lightgbm as lgbm

# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1)) 
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))

# Or you could add to your alg_conf "min_child_weight": 1e-3, "min_child_samples": 20.

# LGBM configuration
alg_conf = {
    "num_boost_round":25,
    "max_depth" : 3,
    "num_leaves" : 31,
    'learning_rate' : 0.1,
    'boosting_type' : 'gbdt',
    'objective' : 'regression_l2',
    "early_stopping_rounds": None,
    "min_child_weight": 1e-3, 
    "min_child_samples": 20
}

# Calling Regressor using scikit-learn API 
sk_reg = lgbm.sklearn.LGBMRegressor(
    num_leaves=alg_conf["num_leaves"], 
    n_estimators=alg_conf["num_boost_round"], 
    max_depth=alg_conf["max_depth"],
    learning_rate=alg_conf["learning_rate"],
    objective=alg_conf["objective"],
    min_sum_hessian_in_leaf=alg_conf["min_child_weight"],
    min_data_in_leaf=alg_conf["min_child_samples"]
)
sk_reg.fit(xs[::2], ys[::2])

print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))


# Calling Regressor using native API 
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)

print("Native API results")
print(lg_reg.predict(xs[1::2]))

工作正常。