How to train a classifier with an array of arrays?

时间:2016-04-15 14:59:14

标签: python-2.7 scikit-learn decision-tree

I want to use a decision tree classifier in order to predict something.

As you can see here:

from sklearn import tree

sample1 = [120,1]
sample2 = [123,3]
features = [sample1,sample2]

labels = [0,1] 

clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)

I have two samples:

  • Sample one: [120,1] which I labelled as 0

  • Sample two: [123,3] which I labelled as 1

So far so good.

But now, instead of this samples, I want to train using an array, something like:

features = [[120,120.2][1, 1.2]]

and the respective label for this sample is:

label = [1]

So my code should be:

from sklearn import tree

features = [[120,120.2][1, 1.2]]

labels = [1] 

clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)

I'm getting the following error:

TypeError: list indices must be integers, not tuple

I understand that the classifier wants a list of integers, and not tuples. And a solution may be:

features = [[120, 120.2, 1, 1.2]]

labels = [1] 

But I don't want to mix up the data, since I have it separately into arrays.

Is there any way I can train my classifier with arrays of arrays of data?

Thanks

1 个答案:

答案 0 :(得分:1)

不,您不能将此格式与数据一起使用,您需要将它们聚合在一个数组中。

expected shape(n_samples,n_features)

这更具逻辑性,因为某些功能描述了一个示例,并且通过使用预期的格式,它可以更好地描述您的数据。