我遇到值错误,但输入变量的形状看起来像它们匹配。这是错误:
ValueError: Found input variables with inconsistent numbers of samples: [644170, 14]
这是我的代码:
# 10-K Folds
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
kfold = KFold(n_splits=10, random_state=1)
results = cross_val_score(estimator = grid.best_estimator_, X = X, y = y, cv = kfold, scoring = 'f1_macro') # https://scikit-learn.org/0.17/modules/generated/sklearn.cross_validation.cross_val_score.html
results # Array of scores of the estimator for each run of the cross validation.
以下是形状:
X.shape
(644170, 14)
y.shape
(14,)
两种形状都有14个。
答案 0 :(得分:2)
错误似乎在这里:
X.shape
# (644170, 14)
y.shape
# (14)
您在训练集中有644170个观测值(具有14个特征),作为目标,您只有14个值...您应该有644170个目标值才能进行交叉验证。
要弄清主意,请看这个经典示例,该示例基于您在sklearn documentation上找到的虹膜数据集:
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cross_val_score(lasso, X, y, cv=3)
X和y的尺寸为:
X.shape
# (150, 10)
y.shape
# (150,)
或每次观察训练集的目标值。