我试图按照给定here的教程来使用递归功能消除和使用我自己的数据进行scikit-learn的交叉验证(RFECV)功能,并继续得到一个令人费解的错误:
ValueError:找到包含0个样本的数组(shape =(0,9)),同时至少需要1个。
我使用的代码如下:
import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV
data = pd.read_csv('data.csv', index_col = 0)
training = data.iloc[:50]
# training on the first 50 rows
training_y = np.asarray(training.C1, dtype="|S6")
training_x = training.drop('C1', axis=1)
print training_y.shape
print training_x.shape
# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
scoring = 'accuracy')
rfecv.fit(training_x, training_y)
仅供参考,两个印刷语句的输出为:
(50)
(50,9)
谢谢!
答案 0 :(得分:0)
我创建了虚拟数据,它对我有用:
import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV
data = np.random.randn(50,9)
# training on the first 50 rows
training_y = np.random.random(50).round()
training_x = data
print(training_y.shape)
print(training_x.shape)
# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
scoring = 'accuracy')
rfecv.fit(training_x, training_y)
输出结果为:
RFECV(cv=sklearn.cross_validation.StratifiedKFold(labels=[ 1. 1. 1. 0. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 0. 1. 0. 1.
1. 0. 1. 0. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 1. 0. 1. 0.
1. 1. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0.], n_folds=3, shuffle=False, random_state=None),
estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False),
estimator_params=None, scoring='accuracy', step=1, verbose=0)
如果您能向我们提供您的数据,那将是件好事。