Question

我试图按照给定here的教程来使用递归功能消除和使用我自己的数据进行scikit-learn的交叉验证（RFECV）功能，并继续得到一个令人费解的错误：

ValueError：找到包含0个样本的数组（shape =（0,9）），同时至少需要1个。

我使用的代码如下：

import pandas as pd
import numpy as np

from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV

data = pd.read_csv('data.csv', index_col = 0)

training = data.iloc[:50]
# training on the first 50 rows
training_y = np.asarray(training.C1, dtype="|S6")
training_x = training.drop('C1', axis=1)

print training_y.shape
print training_x.shape


# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
              scoring = 'accuracy')

rfecv.fit(training_x, training_y)

仅供参考，两个印刷语句的输出为：

（50）

（50,9）

谢谢！

Answer 1

我创建了虚拟数据，它对我有用：

import pandas as pd
import numpy as np

from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV

data = np.random.randn(50,9)

# training on the first 50 rows
training_y = np.random.random(50).round()
training_x = data

print(training_y.shape)
print(training_x.shape)


# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
              scoring = 'accuracy')

rfecv.fit(training_x, training_y)

输出结果为：

RFECV(cv=sklearn.cross_validation.StratifiedKFold(labels=[ 1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  0.  1.  0.  1.
  1.  0.  1.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  1.  1.  0.  1.  0.
  1.  1.  0.  1.  1.  0.  0.  0.  1.  0.  0.  0.  1.  0.], n_folds=3, shuffle=False, random_state=None),
   estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
   estimator_params=None, scoring='accuracy', step=1, verbose=0)

如果您能向我们提供您的数据，那将是件好事。

scikit-learn RFECV数组有0个样本

1 个答案: