在scikit-learn中获取DecisionTreeRegressor的叶节点处的值分布

时间:2016-07-11 04:05:45

标签: python machine-learning scikit-learn random-forest decision-tree

默认情况下,scikit-learn DecisionTreeRegressor返回给定叶节点中训练集中所有目标值的平均值。

但是,我有兴趣从我的训练集中找回落入预测叶节点的目标值列表。这将允许我量化分布,并计算其他指标,如标准偏差。

这是否可以使用scikit-learn?

1 个答案:

答案 0 :(得分:0)

我认为您正在寻找的是apply对象的tree方法。 See here for the source。这是一个例子:

import numpy as np
from sklearn.tree import DecisionTreeRegressor

rs = np.random.RandomState(1234)
x  = rs.randn(10,2)
y  = rs.randn(10)

md  = rs.randint(1, 5)
dtr = DecisionTreeRegressor(max_depth=md)
dtr.fit(x, y)

# The `tree_` object's methods seem to complain if you don't use `float32.
leaf_ids = dtr.tree_.apply(x.astype(np.float32))

print leaf_ids
# => [5 6 6 5 2 6 3 6 6 3]

# Should be probably be equal for small depths.
print 2**md, np.unique(leaf_ids).shape[0]
# => 4, 4