Question

我想为所有样本评估它们落入的叶节点的大小。

基于this excellent answer，我已经找到一种提取每个叶节点的样本数量的方法：

grep -rlZ oldtext . --exclude-dir=.svn | xargs -0 sed -i 's/oldtext/newtext/g'

是否有一种方法可以获取以叶节点结尾的所有样本（from sklearn.tree import _tree, DecisionTreeClassifier import numpy as np clf = DecisionTreeClassifier().fit(X_train, y_train) def tree_get_leaf_size_for_elem(tree, feature_names): tree_ = tree.tree_ def recurse(node): if tree_.feature[node] != _tree.TREE_UNDEFINED: recurse(tree_.children_left[node]) else: samples_in_leaf = np.sum(tree_.value[node][0]) recurse(0) tree_get_leaf_size_for_elem(clf, feature_names)）的索引？ X_train的新列称为“ leaf_node_size”将是所需的输出。

Answer 1

sklearn使您可以通过apply方法轻松完成此操作

from collections import Counter

#get the leaf for each training sample
leaves_index = tree.apply(X_train) 

#use Counter to find the number of elements on each leaf
cnt = Counter( leaves_index )

#and now you can index each input to get the number of elements
elems = [ cnt[x] for x in leaves_index ]

sklearn.tree.DecisionTreeClassifier：获取落入叶节点的所有样本

1 个答案: