面对scikit中的决策树分类器实现的问题

时间:2017-07-16 16:19:39

标签: python scikit-learn

尝试在sci-kit中生成决策树。我有一个CSV文件,作为我的sci-kit程序的输入。当我打印数据集长度为502时,数据集形状为(502,1)。只有一个数组。

我如何适应决策树并获得结果,不确定我是否正确执行,下面是我的代码。

    import numpy as np
    import pandas as pd
    from sklearn import tree
    from sklearn.cross_validation import train_test_split
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.metrics import accuracy_score

    input_file = "output.csv"

    # for tab delimited use:

     df = pd.read_csv(input_file, header = 0, delimiter = "\t")

   # printing the original column values in a python list

     print(df.values)

     print("DataSet Length :",len(df))

     print("DataSet Shape :",df.shape)

    # Assigning values to an array  
     X=df.values[:,0]

   # test train the the data
     X_train,X_test=train_test_split(X,test_size=0.3,random_state=100)

   # Passing to the Decision Tree Classifier, with entropy criterion

    clf_entropy = DecisionTreeClassifier(criterion = "entropy", rando  
    m_state = 100,max_depth=3, min_samples_leaf=5)

    # Fitting the data  to the classifier
    clf_entropy.fit(X_train)

CSV文件位于以下链接

https://drive.google.com/file/d/0B3XlF206d5UrVnh6QS1LRW0xT0U/view?usp=sharing

使用excel下载并打开。请参阅以下sci-kit文档以供参考。

http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier

1 个答案:

答案 0 :(得分:2)

为了适应决策树分类器,您的训练和测试数据需要有标签。使用这些标签,您可以适应树。以下是sklearn website的示例:

from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

问题是,在您的代码中,您只有X个值,没有标签(Y值)。所以你不能适应树。