Question

我需要做一个决策树，并通过图形上的标签表示数据（如两个插图所示）。我对决策树没有问题，不幸的是，图中的点没有插入。我已经尝试了代码中的一些更改。该代码来自scikit学习网站Plot the decision surface of a decision tree on the iris dataset

下面是一个示例数据（X，Y，C5）（来自excel文件）：

Path = "Documents/Apprentissage/Python/Script/ClustAllRepres12.xlsx"
Wordbook = xlrd.open_workbook(Path)
Sheet = Wordbook.sheet_by_index(0)

X=[]
Y=[]
C5=[]

for i in range(1, Sheet.nrows):
    X.append(Sheet.cell_value(i, 0))
    Y.append(Sheet.cell_value(i, 1))
    C5.append(Sheet.cell_value(i, 8))    

X

出[]： [8.0， 9.0， 9.0， 9.0， 9.0， 10.0， 10.0， 11.0， 11.0， 11.0， 11.0， 11.0， 12.0， 12.0， 12.0， 12.0， 12.0， 13.0， 13.0， 13.0， 14.0， 14.0， 14.0， 15.0， 15.0， 15.0， 15.0， 16.0， 16.0， 16.0， 16.0， 17.0， 17.0， 17.0， 17.0， 18.0， 18.0， 18.0， 18.0， 18.0， 19.0， 19.0， 19.0， 19.0， 19.0， 19.0， 20.0， 21.0， 21.0， 22.0]

type(X)

出[]：列表

出[]： [45.0， 17.0， 18.0， 24.0， 25.0， 27.0， 36.0， 38.0， 39.0， 24.0， 37.0， 40.0， 24.0， 31.0， 35.0， 36.0， 37.0， 39.0， 32.0， 33.0， 35.0， 43.0， 27.0， 31.0， 35.0， 42.0， 18.0， 39.0， 43.0， 31.0， 42.0， 28.0， 32.0， 35.0， 43.0， 51.0， 52.0， 17.0， 19.0， 53.0， 49.0， 51.0， 53.0， 58.0， 16.0， 58.0， 59.0， 50.0， 52.0， 54.0]

type(Y)

出[]：列表

C5

出[]： [2.0， 4.0， 3.0， 3.0， 3.0， 4.0， 3.0， 3.0， 3.0， 3.0， 3.0， 3.0， 3.0， 3.0， 3.0， 3.0， 3.0， 3.0， 3.0， 1.0， 4.0， 1.0， 3.0， 1.0， 1.0， 1.0， 1.0， 3.0， 1.0， 3.0， 1.0， 5.0， 1.0， 3.0， 1.0， 1.0， 4.0， 4.0， 4.0， 1.0， 5.0， 1.0， 5.0， 2.0， 5.0， 5.0， 5.0， 5.0， 5.0， 5.0]

type(C5)

出[]：列表

scikit学习网站上有以下代码：

import pandas as pd 
import numpy as np
import xlrd
import matplotlib.pyplot as plt

df_list = pd.DataFrame(
    {'X': X,
     'Y': Y,
    }) 
df = df_list.iloc[:,0:2].values  #transform to float type 

import sklearn
from sklearn.tree import DecisionTreeClassifier, plot_tree

#parameters
n_classes = 5
plot_colors = "ryb"
plot_step = 0.02

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],[1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = df #gives better result without [:, pair]
    y = C5

#train
clf = DecisionTreeClassifier().fit(X, y)

#plot the decision boundary
plt.subplot(2, 3, pairidx +1)
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))
plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)

# Plot the training points
for i, color in zip(range(n_classes), plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1], c=color, label=C5[i],
                cmap=plt.cm.RdYlBu, edgecolor='black', s=15)
plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(loc='lower right', borderpad=0, handletextpad=0)
plt.axis("tight")
plt.figure()
clf = DecisionTreeClassifier().fit(df_list, C5)
plot_tree(clf, filled=True)
plt.show()

有我的图表：

如您所见，图中没有显示点。我给一个等待结果的例子：

Answer 1

idx = np.where（y == i）

在此行y是一个列表为了返回等于i（变量）的列表索引只需转换为列表即可对所有内容进行正常处理

因此，只需在绘制散点图之前添加此行

**y = np.asarray(y)** # convert list to array since where will work for array only
# Plot the training points
for i, color in .....

标签的决策树

1 个答案: