以下是我在您的环境中运行它的代码,我正在使用RandomForestClassifier,我正在尝试为随机林分类器中的选定样本找出 decision_path
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
X, y = make_classification(n_samples=1000,
n_features=6,
n_informative=3,
n_classes=2,
random_state=0,
shuffle=False)
# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
'Feature 2':X[:,1],
'Feature 3':X[:,2],
'Feature 4':X[:,3],
'Feature 5':X[:,4],
'Feature 6':X[:,5],
'Class':y})
y_train = df['Class']
X_train = df.drop('Class',axis = 1)
rf = RandomForestClassifier(n_estimators=50,
random_state=0)
rf.fit(X_train, y_train)
我得到的最远的是这个
#Extracting the decision path for instance i = 12
i_data = X_train.iloc[12].values.reshape(1,-1)
d_path = rf.decision_path(i_data)
print(d_path)
并且外出没有多大意义
(< 1x7046类型为''的稀疏矩阵 486个存储元素,压缩稀疏行格式>,数组([0,133,282,415,588,761,910,1041,1182,1309,1432, 1569,1728,1869,2000,2143,2284,2419,2572,2711,2856,2987, 3128,3261,3430,3549,3704,3839,3980,4127,4258,4389,4534, 4671,4808,4947,5088,5247,5378,5517,5640,5769,5956,6079, 6226,6385,6524,6655,6780,6925,7046],dtype = int32))
我试图找出数据框中粒子样本的决策路径。谁能告诉我怎么做?
这个想法是有这样的东西
http://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html
答案 0 :(得分:2)
RandomForestClassifier.decision_path
方法返回tuple
(indicator, n_nodes_ptr)
。
看文档:
here
所以你的变量node_indicator
是一个元组而不是你的想法。
元组对象没有属性'索引'这就是你做错误的原因:
node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
node_indicator.indptr[sample_id + 1]]
尝试:
(node_indicator, _) = rf.decision_path(X_train)
您还可以为单个样本ID绘制森林每棵树的决策树:
X_train = X_train.values
sample_id = 0
for j, tree in enumerate(rf.estimators_):
n_nodes = tree.tree_.node_count
children_left = tree.tree_.children_left
children_right = tree.tree_.children_right
feature = tree.tree_.feature
threshold = tree.tree_.threshold
print("Decision path for DecisionTree {0}".format(j))
node_indicator = tree.decision_path(X_train)
leave_id = tree.apply(X_train)
node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
node_indicator.indptr[sample_id + 1]]
print('Rules used to predict sample %s: ' % sample_id)
for node_id in node_index:
if leave_id[sample_id] != node_id:
continue
if (X_train[sample_id, feature[node_id]] <= threshold[node_id]):
threshold_sign = "<="
else:
threshold_sign = ">"
print("decision id node %s : (X_train[%s, %s] (= %s) %s %s)"
% (node_id,
sample_id,
feature[node_id],
X_train[sample_id, feature[node_id]],
threshold_sign,
threshold[node_id]))
请注意,在您的情况下,您有50个估算器,因此阅读可能会有点无聊。