随机森林分类器决策路径方法(scikit)

时间:2017-03-14 18:01:03

标签: python scikit-learn random-forest

我在titanic数据集上实现了一个标准的randomforestclassifier,并希望探索在v0.18中引入的sklearn的decision_path方法。 (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

然而,它输出一个稀疏矩阵,我不确定如何理解。任何人都可以建议如何最好地想象这个?

#Training a simplified random forest
estimator = RandomForestClassifier(random_state=0, n_estimators=3, max_depth=3)
estimator.fit(X_train, y_train)

#Extracting the decision path for instance i = 12
i_data = X_test.iloc[12].values.reshape(1,-1)
d_path = rf_best.decision_path(i_data)

print(d_path)

输出:

  

(< 1x3982类型为''的稀疏矩阵,带有598   压缩稀疏行格式的存储元素>,array([0,45,
  98,149,190,233,258,309,360,401,430,           461,512,541,580,623,668,711,760,803,852,889,           932,981,1006,1035,1074,1107,1136,1165,1196,1241,1262,          1313,1350,1385,1420,1465,1518,1553,1590,1625,1672,1707,          1744,1787,1812,1863,1904,1945,1982,2017,2054,2097,2142,          2191,2228,2267,2304,2343,2390,2419,2456,2489,2534,2583,          2632,2677,2714,2739,2786,2833,2886,2919,2960,2995,3032,          3073,3126,3157,3194,3239,3274,3313,3354,3409,3458,3483,          3516,3539,3590,3629,3660,3707,3750,3777,3822,3861,3898,          3939,3982],dtype = int32))

如果我没有提供足够的详细信息,请道歉 - 否则请告诉我。

谢谢!

注意:编辑以简化随机森林(限制深度和n_trees)

1 个答案:

答案 0 :(得分:2)

如果您想要想象森林中的树木,您可以尝试这里提供的答案:https://stats.stackexchange.com/q/118016

适应您的问题:

from sklearn import tree

...

i_tree = 0
for tree_in_forest in estimator.estimators_:
    with open('tree_' + str(i_tree) + '.dot', 'w') as my_file:
        my_file = tree.export_graphviz(tree_in_forest, out_file = my_file)
    i_tree = i_tree + 1

这将为i = 0到9创建名为tree_i.dot的10个(林中的默认树数)文件。您可以为终端中的每个文件创建pdf文件(例如):

$ dot -Tpdf tree_0.dot -o tree.pdf

可能有更聪明的方法,如果有人可以提供帮助,我很乐意学习它。)