Question

I want to use a decision tree classifier in order to predict something.

As you can see here:

from sklearn import tree

sample1 = [120,1]
sample2 = [123,3]
features = [sample1,sample2]

labels = [0,1] 

clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)

I have two samples:

Sample one: [120,1] which I labelled as 0
Sample two: [123,3] which I labelled as 1

So far so good.

But now, instead of this samples, I want to train using an array, something like:

features = [[120,120.2][1, 1.2]]

and the respective label for this sample is:

label = [1]

So my code should be:

from sklearn import tree

features = [[120,120.2][1, 1.2]]

labels = [1] 

clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)

I'm getting the following error:

TypeError: list indices must be integers, not tuple

I understand that the classifier wants a list of integers, and not tuples. And a solution may be:

features = [[120, 120.2, 1, 1.2]]

labels = [1]

But I don't want to mix up the data, since I have it separately into arrays.

Is there any way I can train my classifier with arrays of arrays of data?

Thanks

Answer 1

不，您不能将此格式与数据一起使用，您需要将它们聚合在一个数组中。

expected shape是（n_samples，n_features）。

这更具逻辑性，因为某些功能描述了一个示例，并且通过使用预期的格式，它可以更好地描述您的数据。

How to train a classifier with an array of arrays?

1 个答案: