我正在尝试使用kfold
sklearn
进行交叉验证
def train_and_evaluate(clf, X_train, y_train):
clf.fit(X_train, y_train)
# create a k-fold cross validation iterator of k=5 folds
cv = KFold(int(X_train.shape[0]), 4, shuffle = True) ## Classic KFold
scores = cross_val_score(clf, X_train, y_train, cv=cv)
return (clf, scores)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=42)
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
但是我收到以下错误:
clf1, scores1 = train_and_evaluate(linear_model.SGDRegressor(), X_train, y_train)
TypeError: __init__() got multiple values for keyword argument 'shuffle'
答案 0 :(得分:1)
KFold的功能签名如下所示
sklearn.model_selection.KFold(n_splits=3, shuffle=False, random_state=None)
所以当你传递这两个位置参数(int(X_train.shape[0]), 4)
时,你将为参数shuffle
传递4。然后,您也可以按名称传递shuffle
,这样就可以获得多个参数错误。
我不清楚为什么要传递这两个位置参数,但我认为如果你想要一个4倍的分割,你只需要传递4个
答案 1 :(得分:0)
import numpy as np
x=np.arange(100)
from sklearn.model_selection import KFold
kf=KFold(5,shuffle=True,random_state=None)
x=kf.split(X)
for i,j in x:
print(i,j)