使用cross_validation随机播放

时间:2018-08-06 12:44:41

标签: python machine-learning scikit-learn cross-validation

我有一个像这样的数据集:

[  5.        ,   2.        ,  15.        ,   0.25535303],
   [  5.        ,   3.        ,  15.        ,   6.72465845],
   [  5.        ,   4.        ,  15.        ,   5.62719504],
   [  5.        ,   5.        ,  15.        ,   5.61760597],
   [  5.        ,   6.        ,  15.        ,   4.9561533 ],
   [  6.        ,   2.        ,  15.        ,   0.2709665 ],
   [  6.        ,   3.        ,  15.        ,   6.07004364],
   [  6.        ,   4.        ,  15.        ,   5.62719504],
   [  6.        ,   5.        ,  15.        ,   5.54684885],
   [  6.        ,   6.        ,  15.        ,   5.32846201],
   [  2.        ,   2.        ,  20.        ,   3.79257349],
   [  2.        ,   3.        ,  20.        ,   4.00440964],
   [  2.        ,   4.        ,  20.        ,   4.37965706],
   [  2.        ,   5.        ,  20.        ,   3.92216922],
   [  2.        ,   6.        ,  20.        ,   3.41378368],
   [  3.        ,   2.        ,  20.        ,   0.13500398],
   [  3.        ,   3.        ,  20.        ,   4.38384781],
   [  3.        ,   4.        ,  20.        ,   5.17229688],
   [  3.        ,   5.        ,  20.        ,   5.00464056],

第三列的值从15到35。我想应用交叉验证,但我怀疑K折将在每个K块中仅在第三列中包含相同的值,这会对我产生负面影响模型。

因此,我的解决方法是:

dataset_shuffle = shuffle(dataset)
X = dataset_shuffle["A", "B", "C"]
y = dataset_shuffle["D"]

result = cross_validate(estimator,X,y,scoring=scoretypes,cv=5,return_train_score=False)

r2 = result['test_r2'].mean()
mselist = -result['test_neg_mean_squared_error']
rmse = np.sqrt(mselist).mean()

您是否认为这是解决我的问题的合适方法? 我的解决方案与此相同吗?:

X = dataset["A", "B", "C"]
y = dataset["D"]
cv = KFold(n_splits=5, shuffle=True)
result = cross_validate(estimator,X,y,scoring=scoretypes,cv=cv,return_train_score=False)

0 个答案:

没有答案