Sklearn:不允许使用负尺寸

时间:2015-03-26 01:11:43

标签: python numpy machine-learning scipy scikit-learn

我使用了sklearn NearestNeighbors包来对数据集进行分类。它工作正常,直到我尝试在KNN预测中使用'distance'加权。当我从negative dimensions are not allowed权重切换到'uniform'权重时,我收到错误消息'distance''uniform'权重工作正常。

错误消息如下:

/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/regression.py:160: RuntimeWarning: invalid value encountered in divide
  y_pred[:, j] = num / denom
Traceback (most recent call last):
  File "analysis.py", line 333, in <module>
    main()
  File "analysis.py", line 330, in main
    ind_test_labels, trainIDs, ind_test_IDs, train_data_original, ind_test_data_original)
  File "analysis.py", line 297, in target1
    outfile = generate_result(X, feature_names, train_label, outfile, trainIDs, train_labels, best_k, train_data_original, ind_test_data_original)
  File "analysis.py", line 130, in generate_result
    predicted_label = regressor.predict(test)
  File "/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/regression.py", line 144, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "/home/linux/.local/lib/python2.7/site-packages/sklearn/neighbors/base.py", line 332, in kneighbors
    return_distance=return_distance)
  File "binary_tree.pxi", line 1313, in sklearn.neighbors.kd_tree.BinaryTree.query (sklearn/neighbors/kd_tree.c:10528)
  File "binary_tree.pxi", line 595, in sklearn.neighbors.kd_tree.NeighborsHeap.__init__ (sklearn/neighbors/kd_tree.c:4937)
ValueError: negative dimensions are not allowed

我对错误信息感到困惑。我唯一可以猜到的是训练和测试集中都有相同的实例,因此其距离的倒数会导致除以零误差。但这不太可能发生在6个功能中。

那么有谁可以指出哪里出错了?或者你能指出我可能的方向,以便提供更多细节吗?

------更新--------------------- 我粘贴了出错的代码段。 训练X的读取和操作如下:

train_data = np.loadtext(...)
train_data = preprocessing.scale(train_data);
X_T = train_data.T
X = X_T[[features]].T # features is a tuple that contains columns to be selected for classification
# Then X is passed to generate_result below
#######################################
def generate_result(X, feature_names, train_label, outfile, IDs, labels, k, train_original, ind_test_original):
  """
  Purpose: this function does the analysis and outputs the result to file
  Inputs: training set, names of selected features, training set labels, file writer stream, IDs of training set,
          labels of training set, number of neighbors, original training set, independent test set 
  Returns: file writer stream
  """
  cv = cross_validation.KFold(len(X), 10) # 10-fold cross-validation
  feature_str = ','.join(feature_names)
  outfile.write('Best K = ' + str(k) + '\n')
  outfile.write('10-Fold Cross Validation begins \n')
  numCV = 1 #predicted_GFR_str = array_to_string(predicted_label)
  for traincv, testcv in cv:
    outfile.write('Iteration: ' + str(numCV) + '\n')
    outfile.write(complete_features + ',label' + str(numCV) + ',Catagory' + str(numCV) + '\n')
    train = X[traincv]
    test = X[testcv]
    ### run regression
    regressor = KNeighborsRegressor(n_neighbors = k, weights = 'distance', p = 1)       

    label_cv_train = train_label[traincv]
    regressor.fit(train, label_cv_train)
    test = X[testcv]
    label_cv_test = train_label[testcv]
    predicted_label = regressor.predict(test)# THIS LINE IS CAUSING THE PROBLEM


    # more code below not pasted

0 个答案:

没有答案