Question

我正在使用sklearn cross_validate来获取train_score和test_scores。

在分割数据集时，是否有一种方法可以洗牌？现在按行顺序拆分吗？那么如果有100行数据，将设置前1〜10个数据，接下来的11〜20等吗？

这是我的代码：

kfold = KFold(n_splits=10, shuffle=True, random_state=0)
scores = cross_val_score(estimator=gbr_onehot,
                        X=X,
                        y=y,
                        cv=kfold,
                        scoring="neg_mean_squared_error",
                        n_jobs=-1)

Answer 1

如果将random_state设置为None，则在调用get_n_splits(X)或在简历中使用它时，它将随机分割数据集

您有random_state = 0可有效修复数据的随机改组

Answer 2

在分割数据集时，是否有一种方法可以洗牌？现在是按行顺序拆分吗？

基于docs，您已经决定是否先对其进行洗牌：

命令：

shuffleboolean，可选

Whether to shuffle the data before splitting into batches.

决定是否是否先随机整理数据，并将其设置为：shuffle=True，这意味着行已被随机整理。

另一个命令：

random_stateint，RandomState实例或无，可选，默认=无

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Only used when shuffle is True. This should be left to None if shuffle is False.

将使随机播放变得引人注目，这意味着，如果将其设置为random_state数字，则将始终生成相同的随机播放。

有没有一种方法可以在sklearn中使用cross_validate对数据集进行混洗？

2 个答案: