I want to use a decision tree classifier
in order to predict something.
As you can see here:
from sklearn import tree
sample1 = [120,1]
sample2 = [123,3]
features = [sample1,sample2]
labels = [0,1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
I have two samples:
Sample one: [120,1]
which I labelled as 0
Sample two: [123,3]
which I labelled as 1
So far so good.
But now, instead of this samples, I want to train using an array, something like:
features = [[120,120.2][1, 1.2]]
and the respective label for this sample is:
label = [1]
So my code should be:
from sklearn import tree
features = [[120,120.2][1, 1.2]]
labels = [1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
I'm getting the following error
:
TypeError: list indices must be integers, not tuple
I understand that the classifier wants a list of integers, and not tuples. And a solution may be:
features = [[120, 120.2, 1, 1.2]]
labels = [1]
But I don't want to mix up the data, since I have it separately into arrays.
Is there any way I can train my classifier with arrays of arrays of data?
Thanks
答案 0 :(得分:1)
不,您不能将此格式与数据一起使用,您需要将它们聚合在一个数组中。
expected shape是(n_samples,n_features)。
这更具逻辑性,因为某些功能描述了一个示例,并且通过使用预期的格式,它可以更好地描述您的数据。