如何在决策树输出中打印列名称,例如要素1,要素2,要素3,要素4,要素5或要素6,而不是-2
。
答案应该是可扩展的,例如,如果某人有500列或更多列。
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
X, y = make_classification(n_samples=1000,
n_features=6,
n_informative=3,
n_classes=2,
random_state=0,
shuffle=False)
# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
'Feature 2':X[:,1],
'Feature 3':X[:,2],
'Feature 4':X[:,3],
'Feature 5':X[:,4],
'Feature 6':X[:,5],
'Class':y})
y_train = df['Class']
X_train = df.drop('Class',axis = 1)
from sklearn.tree import _tree
# Using those arrays, we can parse the tree structure:
n_nodes = dt.tree_.node_count
children_left = dt.tree_.children_left
children_right = dt.tree_.children_right
feature = dt.tree_.feature
threshold = dt.tree_.threshold
new_X = np.array(X_train)
node_indicator = dt.decision_path(new_X)
# Similarly, we can also have the leaves ids reached by each sample.
leave_id = dt.apply(new_X)
# Now, it's possible to get the tests that were used to predict a sample or
# a group of samples. First, let's make it for the sample.
sample_id = 703
node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
node_indicator.indptr[sample_id + 1]]
print('Rules used to predict sample %s: ' % sample_id)
for node_id in node_index:
if leave_id[sample_id] != node_id:
continue
if (new_X[sample_id, feature[node_id]] <= threshold[node_id]):
threshold_sign = "<="
else:
threshold_sign = ">"
print("decision id node %s : (For sample number [%s, %s] (= %s) %s %s)"
% (node_id,
sample_id,
feature[node_id],
new_X[sample_id, feature[node_id]],
threshold_sign,
threshold[node_id]))
用于预测样本703的规则: 决策ID节点11 :(样本号[703,-2](= -0.210092480919)> -2.0)
答案 0 :(得分:0)
只需将feature[node_id]
替换为df.columns[feature[node_id]]
,就像这样:
print("decision id node %s : (For sample number [%s, %s] (= %s) %s %s)"
% (node_id,
sample_id,
X_train.columns[feature[node_id]],
new_X[sample_id, feature[node_id]],
threshold_sign,
threshold[node_id]))