我正在从Sklearn运行GradientBoostingClassifier,我从详细输出中获得了一些奇怪的输出。我从我的整个数据集中随机抽取10%的样本,大多数似乎都没问题,但有时我会得到奇怪的输出和糟糕的结果。有人可以解释一下发生了什么吗?
“好”的结果:
n features = 168
GradientBoostingClassifier(criterion='friedman_mse', init=None,
learning_rate=0.01, loss='deviance', max_depth=4,
max_features=None, max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=2000, presort='auto', random_state=None,
subsample=1.0, verbose=1, warm_start=False)
Iter Train Loss Remaining Time
1 0.6427 40.74m
2 0.6373 40.51m
3 0.6322 40.34m
4 0.6275 40.33m
5 0.6230 40.31m
6 0.6187 40.18m
7 0.6146 40.34m
8 0.6108 40.42m
9 0.6071 40.43m
10 0.6035 40.28m
20 0.5743 40.12m
30 0.5531 39.74m
40 0.5367 39.49m
50 0.5237 39.13m
60 0.5130 38.78m
70 0.5041 38.47m
80 0.4963 38.34m
90 0.4898 38.22m
100 0.4839 38.14m
200 0.4510 37.07m
300 0.4357 35.49m
400 0.4270 33.87m
500 0.4212 31.77m
600 0.4158 29.82m
700 0.4108 27.74m
800 0.4065 25.69m
900 0.4025 23.55m
1000 0.3987 21.39m
2000 0.3697 0.00s
predicting
this_file_MCC = 0.5777
“糟糕”的结果:
Training the classifier
n features = 168
GradientBoostingClassifier(criterion='friedman_mse', init=None,
learning_rate=1.0, loss='deviance', max_depth=5,
max_features='sqrt', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=500, presort='auto', random_state=None,
subsample=1.0, verbose=1, warm_start=False)
Iter Train Loss Remaining Time
1 0.5542 1.07m
2 0.5299 1.18m
3 0.5016 1.14m
4 0.4934 1.16m
5 0.4864 1.19m
6 0.4756 1.21m
7 0.4699 1.24m
8 0.4656 1.26m
9 0.4619 1.24m
10 0.4572 1.26m
20 0.4244 1.27m
30 0.4063 1.24m
40 0.3856 1.20m
50 0.3711 1.18m
60 0.3578 1.13m
70 0.3407 1.10m
80 0.3264 1.09m
90 0.3155 1.06m
100 0.3436 1.04m
200 0.3516 46.55s
300 1605.5140 29.64s
400 52215150662014.0469 13.70s
500 585408988869401440279216573629431147797247696359586211550088082222979417986203510562624281874357206861232303015821113689812886779519405981626661580487933040706291550387961400555272759265345847455837036753780625546140668331728366820653710052494883825953955918423887242778169872049367771382892462080.0000 0.00s
predicting
this_file_MCC = 0.0398
答案 0 :(得分:1)
你的学习成绩"坏"示例太高,您在Gradient Boosting算法的梯度下降步骤中跳过局部或全局最小值。这会导致分歧情况并导致您看到的错误爆炸。看看this lecture from Andrew Ng's Machine Learning Course。学习率的相关部分大约在4:30开始。
将Gradient Descent / Ascent视为尝试找到通往山谷/山丘底部或顶部的路径,或者理想地找到全球最低/最高点的过程。如果丘陵/山谷非常大,你走了一小步,你最终应该能够找到至少达到当地最小/最大值的路。但是如果山丘/山谷与你的台阶的大小成比例很小,那么很容易跨越最大/最小值并最终到达某个可怕的地方。学习率表示您的步骤的大小,在" good"例如,你的学习率(alpha)是0.01,所以你能够在正确的方向(大部分)采取微小的步骤,直到你达到最小值,但在"坏"例如你的alpha是1.0,所以你正在采取大步骤并跳过局部最小值并最终上升而不是下降。这是一种非常基本的思考算法中学习速率的方法。
如果你在tuning the learning rate on DatumBox上阅读这篇文章,你会看到一个经常回收的过程可视化(不确定是谁从谁那里偷走了这张图片,但它到处都是)以及一些关于自适应地改变学习率。不确定这是否是sklearn中的默认值,但我不会指望它。