Scikit-Learn GradientBoostingClassifier中的错误?

时间:2017-04-14 09:58:12

标签: python python-2.7 scikit-learn gradient-descent

我正在从Sklearn运行GradientBoostingClassifier,我从详细输出中获得了一些奇怪的输出。我从我的整个数据集中随机抽取10%的样本,大多数似乎都没问题,但有时我会得到奇怪的输出和糟糕的结果。有人可以解释一下发生了什么吗?

“好”的结果:

n features = 168
GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.01, loss='deviance', max_depth=4,
              max_features=None, max_leaf_nodes=None,
              min_impurity_split=1e-07, min_samples_leaf=1,
              min_samples_split=2, min_weight_fraction_leaf=0.0,
              n_estimators=2000, presort='auto', random_state=None,
              subsample=1.0, verbose=1, warm_start=False)
      Iter       Train Loss   Remaining Time 
         1           0.6427           40.74m
         2           0.6373           40.51m
         3           0.6322           40.34m
         4           0.6275           40.33m
         5           0.6230           40.31m
         6           0.6187           40.18m
         7           0.6146           40.34m
         8           0.6108           40.42m
         9           0.6071           40.43m
        10           0.6035           40.28m
        20           0.5743           40.12m
        30           0.5531           39.74m
        40           0.5367           39.49m
        50           0.5237           39.13m
        60           0.5130           38.78m
        70           0.5041           38.47m
        80           0.4963           38.34m
        90           0.4898           38.22m
       100           0.4839           38.14m
       200           0.4510           37.07m
       300           0.4357           35.49m
       400           0.4270           33.87m
       500           0.4212           31.77m
       600           0.4158           29.82m
       700           0.4108           27.74m
       800           0.4065           25.69m
       900           0.4025           23.55m
      1000           0.3987           21.39m
      2000           0.3697            0.00s
predicting
this_file_MCC = 0.5777

“糟糕”的结果:

Training the classifier
n features = 168
GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=1.0, loss='deviance', max_depth=5,
              max_features='sqrt', max_leaf_nodes=None,
              min_impurity_split=1e-07, min_samples_leaf=1,
              min_samples_split=2, min_weight_fraction_leaf=0.0,
              n_estimators=500, presort='auto', random_state=None,
              subsample=1.0, verbose=1, warm_start=False)
      Iter       Train Loss   Remaining Time 
         1           0.5542            1.07m
         2           0.5299            1.18m
         3           0.5016            1.14m
         4           0.4934            1.16m
         5           0.4864            1.19m
         6           0.4756            1.21m
         7           0.4699            1.24m
         8           0.4656            1.26m
         9           0.4619            1.24m
        10           0.4572            1.26m
        20           0.4244            1.27m
        30           0.4063            1.24m
        40           0.3856            1.20m
        50           0.3711            1.18m
        60           0.3578            1.13m
        70           0.3407            1.10m
        80           0.3264            1.09m
        90           0.3155            1.06m
       100           0.3436            1.04m
       200           0.3516           46.55s
       300        1605.5140           29.64s
       400 52215150662014.0469           13.70s
       500 585408988869401440279216573629431147797247696359586211550088082222979417986203510562624281874357206861232303015821113689812886779519405981626661580487933040706291550387961400555272759265345847455837036753780625546140668331728366820653710052494883825953955918423887242778169872049367771382892462080.0000            0.00s
predicting
this_file_MCC = 0.0398

1 个答案:

答案 0 :(得分:1)

你的学习成绩"坏"示例太高,您在Gradient Boosting算法的梯度下降步骤中跳过局部或全局最小值。这会导致分歧情况并导致您看到的错误爆炸。看看this lecture from Andrew Ng's Machine Learning Course。学习率的相关部分大约在4:30开始。

将Gradient Descent / Ascent视为尝试找到通往山谷/山丘底部或顶部的路径,或者理想地找到全球最低/最高点的过程。如果丘陵/山谷非常大,你走了一小步,你最终应该能够找到至少达到当地最小/最大值的路。但是如果山丘/山谷与你的台阶的大小成比例很小,那么很容易跨越最大/最小值并最终到达某个可怕的地方。学习率表示您的步骤的大小,在" good"例如,你的学习率(alpha)是0.01,所以你能够在正确的方向(大部分)采取微小的步骤,直到你达到最小值,但在"坏"例如你的alpha是1.0,所以你正在采取大步骤并跳过局部最小值并最终上升而不是下降。这是一种非常基本的思考算法中学习速率的方法。

如果你在tuning the learning rate on DatumBox上阅读这篇文章,你会看到一个经常回收的过程可视化(不确定是谁从谁那里偷走了这张图片,但它到处都是)以及一些关于自适应地改变学习率。不确定这是否是sklearn中的默认值,但我不会指望它。