scikit-learn RFECV数组有0个样本

时间:2016-05-27 22:53:07

标签: python scikit-learn

我试图按照给定here的教程来使用递归功能消除和使用我自己的数据进行scikit-learn的交叉验证(RFECV)功能,并继续得到一个令人费解的错误:

  

ValueError:找到包含0个样本的数组(shape =(0,9)),同时至少需要1个。

我使用的代码如下:

import pandas as pd
import numpy as np

from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV

data = pd.read_csv('data.csv', index_col = 0)

training = data.iloc[:50]
# training on the first 50 rows
training_y = np.asarray(training.C1, dtype="|S6")
training_x = training.drop('C1', axis=1)

print training_y.shape
print training_x.shape


# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
              scoring = 'accuracy')

rfecv.fit(training_x, training_y)

仅供参考,两个印刷语句的输出为:

  

(50)

     

(50,9)

谢谢!

1 个答案:

答案 0 :(得分:0)

我创建了虚拟数据,它对我有用:

import pandas as pd
import numpy as np

from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV

data = np.random.randn(50,9)

# training on the first 50 rows
training_y = np.random.random(50).round()
training_x = data

print(training_y.shape)
print(training_x.shape)


# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
              scoring = 'accuracy')

rfecv.fit(training_x, training_y)

输出结果为:

RFECV(cv=sklearn.cross_validation.StratifiedKFold(labels=[ 1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  0.  1.  0.  1.
  1.  0.  1.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  1.  1.  0.  1.  0.
  1.  1.  0.  1.  1.  0.  0.  0.  1.  0.  0.  0.  1.  0.], n_folds=3, shuffle=False, random_state=None),
   estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
   estimator_params=None, scoring='accuracy', step=1, verbose=0)

如果您能向我们提供您的数据,那将是件好事。