KNeighborsClassifier得分中的奇怪ValueError

时间:2015-04-13 10:27:10

标签: python scikit-learn

我想绘制K Nearest Neighbors分类器的学习曲线。我有以下代码:

X_train = #training data
Y_train = #target variables
best_neighbors = #number of neighbors which gave highest score (3)

idx = len(X_train)/5000
scores = pd.DataFrame(np.zeros((idx+1, 2)), index=np.arange(1, len(X_train), 5000), columns=['Train Score', 'CV Score'])

for i in range(1, len(X_train), 5000):
    X_train_set = X_train[:i]
    Y_train_set = Y_train[:i]
    neigh = KNeighborsClassifier(n_neighbors = best_neigbors)
    neigh.fit(X_train_set, Y_train_set)

    train_score = neigh.score(X_train, Y_train)
    cv_score = neigh.score(X_test, Y_test)

    scores['Train Score'][i] = train_score
    scores['CV Score'][i] = cv_score

此代码在使用之前完美运行决策树或随机森林,但在这里我得到以下奇怪的错误:

      ValueError                                Traceback (most recent call last)
<ipython-input-6-95e645e75971> in <module>()
     10     neigh.fit(X_train_set, Y_train_set)
     11 
---> 12     train_score = neigh.score(X_train, Y_train)
     13     cv_score = neigh.score(X_test, Y_test)
     14 

//anaconda/lib/python2.7/site-packages/sklearn/base.pyc in score(self, X, y, sample_weight)
    289         """
    290         from .metrics import accuracy_score
--> 291         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
    292 
    293 

//anaconda/lib/python2.7/site-packages/sklearn/neighbors/classification.pyc in predict(self, X)
    145         X = atleast2d_or_csr(X)
    146 
--> 147         neigh_dist, neigh_ind = self.kneighbors(X)
    148 
    149         classes_ = self.classes_

//anaconda/lib/python2.7/site-packages/sklearn/neighbors/base.pyc in kneighbors(self, X, n_neighbors, return_distance)
    316                                           **self.effective_metric_params_)
    317 
--> 318             neigh_ind = argpartition(dist, n_neighbors - 1, axis=1)
    319             neigh_ind = neigh_ind[:, :n_neighbors]
    320             # argpartition doesn't guarantee sorted order, so we sort again

//anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in argpartition(a, kth, axis, kind, order)
    689     except AttributeError:
    690         return _wrapit(a, 'argpartition',kth, axis, kind, order)
--> 691     return argpartition(kth, axis, kind=kind, order=order)
    692 
    693 

ValueError: kth(=2) out of bounds (1)

知道这意味着什么以及如何解决这个问题?

编辑:更新scikit后学习版本0.16,我收到以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-66-21f434a289fc> in <module>()
     10     neigh.fit(X_train_set, Y_train_set)
     11 
---> 12     train_score = neigh.score(X_train, Y_train)
     13     cv_score = neigh.score(X_test, Y_test)
     14 

//anaconda/lib/python2.7/site-packages/sklearn/base.pyc in score(self, X, y, sample_weight)
    293         """
    294         from .metrics import accuracy_score
--> 295         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
    296 
    297 

//anaconda/lib/python2.7/site-packages/sklearn/neighbors/classification.pyc in predict(self, X)
    136         X = check_array(X, accept_sparse='csr')
    137 
--> 138         neigh_dist, neigh_ind = self.kneighbors(X)
    139 
    140         classes_ = self.classes_

//anaconda/lib/python2.7/site-packages/sklearn/neighbors/base.pyc in kneighbors(self, X, n_neighbors, return_distance)
    337             raise ValueError(
    338                 "Expected n_neighbors <= %d. Got %d" %
--> 339                 (train_size, n_neighbors)
    340             )
    341         n_samples, _ = X.shape

ValueError: Expected n_neighbors <= 1. Got 3

1 个答案:

答案 0 :(得分:2)

您正在尝试使用仅有一个数据点的3最近邻分类器。这不起作用。 顺便说一下,learning curves中有函数和scikit-learn中的验证曲线。