Question

我正在对28个特征和5000个样本的数据集进行KNN分类：

trainingSet = []
testSet = []
imdb_score = range(1,11)

print ("Start splitting the dataset ...")
splitDataset(path + 'movies.csv', 0.60, trainingSet, testSet)

print ("Start KNeighborsClassifier ... \n")
neigh = KNeighborsClassifier(n_neighbors=5)
neigh.fit(trainingSet, imdb_score)

然而，我遇到了这个错误：

    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [3362, 10]

我认为我的代码看起来不错。那么，有没有人遇到过这个问题？

Answer 1

所以你有6000个样本，使用60％的样本，产生了3362个样本（因为看起来，我不会给你精确的计算结果）。

您致电fit(X,Y) where the following is needed：

y : {array-like, sparse matrix}
Target values of shape = [n_samples] or [n_samples, n_outputs]

由于您的y=imdb_score只是10个值的列表，因此这些规则都不适用，因为它需要是具有3362值的数组类似数据结构（列表可以）或形状数组(3362, 1)。

使用sklearn使用KNeighborsClassifier时出错

1 个答案: