标签的决策树

时间:2019-12-09 19:19:30

标签: python decision-tree graph-visualization

我需要做一个决策树,并通过图形上的标签表示数据(如两个插图所示)。我对决策树没有问题,不幸的是,图中的点没有插入。我已经尝试了代码中的一些更改。 该代码来自scikit学习网站Plot the decision surface of a decision tree on the iris dataset


下面是一个示例数据(X,Y,C5)(来自excel文件):

Path = "Documents/Apprentissage/Python/Script/ClustAllRepres12.xlsx"
Wordbook = xlrd.open_workbook(Path)
Sheet = Wordbook.sheet_by_index(0)

X=[]
Y=[]
C5=[]

for i in range(1, Sheet.nrows):
    X.append(Sheet.cell_value(i, 0))
    Y.append(Sheet.cell_value(i, 1))
    C5.append(Sheet.cell_value(i, 8))    

X

出[]: [8.0,  9.0,  9.0,  9.0,  9.0,  10.0,  10.0,  11.0,  11.0,  11.0,  11.0,  11.0,  12.0,  12.0,  12.0,  12.0,  12.0,  13.0,  13.0,  13.0,  14.0,  14.0,  14.0,  15.0,  15.0,  15.0,  15.0,  16.0,  16.0,  16.0,  16.0,  17.0,  17.0,  17.0,  17.0,  18.0,  18.0,  18.0,  18.0,  18.0,  19.0,  19.0,  19.0,  19.0,  19.0,  19.0,  20.0,  21.0,  21.0,  22.0]

type(X)

出[]:列表

Y

出[]: [45.0,  17.0,  18.0,  24.0,  25.0,  27.0,  36.0,  38.0,  39.0,  24.0,  37.0,  40.0,  24.0,  31.0,  35.0,  36.0,  37.0,  39.0,  32.0,  33.0,  35.0,  43.0,  27.0,  31.0,  35.0,  42.0,  18.0,  39.0,  43.0,  31.0,  42.0,  28.0,  32.0,  35.0,  43.0,  51.0,  52.0,  17.0,  19.0,  53.0,  49.0,  51.0,  53.0,  58.0,  16.0,  58.0,  59.0,  50.0,  52.0,  54.0]

type(Y)

出[]:列表

C5

出[]: [2.0,  4.0,  3.0,  3.0,  3.0,  4.0,  3.0,  3.0,  3.0,  3.0,  3.0,  3.0,  3.0,  3.0,  3.0,  3.0,  3.0,  3.0,  3.0,  1.0,  4.0,  1.0,  3.0,  1.0,  1.0,  1.0,  1.0,  3.0,  1.0,  3.0,  1.0,  5.0,  1.0,  3.0,  1.0,  1.0,  4.0,  4.0,  4.0,  1.0,  5.0,  1.0,  5.0,  2.0,  5.0,  5.0,  5.0,  5.0,  5.0,  5.0]

type(C5)

出[]:列表


scikit学习网站上有以下代码:

import pandas as pd 
import numpy as np
import xlrd
import matplotlib.pyplot as plt

df_list = pd.DataFrame(
    {'X': X,
     'Y': Y,
    }) 
df = df_list.iloc[:,0:2].values  #transform to float type 

import sklearn
from sklearn.tree import DecisionTreeClassifier, plot_tree

#parameters
n_classes = 5
plot_colors = "ryb"
plot_step = 0.02

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],[1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = df #gives better result without [:, pair]
    y = C5

#train
clf = DecisionTreeClassifier().fit(X, y)

#plot the decision boundary
plt.subplot(2, 3, pairidx +1)
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))
plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)

# Plot the training points
for i, color in zip(range(n_classes), plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1], c=color, label=C5[i],
                cmap=plt.cm.RdYlBu, edgecolor='black', s=15)
plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(loc='lower right', borderpad=0, handletextpad=0)
plt.axis("tight")
plt.figure()
clf = DecisionTreeClassifier().fit(df_list, C5)
plot_tree(clf, filled=True)
plt.show()

有我的图表

enter image description here

如您所见,图中没有显示点。 我给一个等待结果的例子

enter image description here

1 个答案:

答案 0 :(得分:0)

  

idx = np.where(y == i)

在此行y是一个列表 为了返回等于i(变量)的列表索引 只需转换为列表即可对所有内容进行正常处理

因此,只需在绘制散点图之前添加此行

**y = np.asarray(y)** # convert list to array since where will work for array only
# Plot the training points
for i, color in .....