Question

我想对这个数据尝试一个随机森林，其中x =吃完后y =快乐。其中一些人很幸运，得到了两顿免费餐，而有些人只有一顿。我可以使用rsample来确保在训练和测试拆分中都没有出现相同的ID（在本例中为5）吗？如果没有，该怎么办？

library(tibble)
library(rsample)

set.seed(123)
dframe <- tibble(id = c(1,1,2,2,3,4,5,5,6,7), 
                 ate = sample(c("cookie", "slug"), size = 10, replace = TRUE),
                 happy = sample(c("yes", "no"), size = 10, replace = TRUE))


dframe_split <- initial_split(dframe, strata = "happy")
dframe_train <- training(dframe_split)
dframe_test <- testing(dframe_split)

由reprex程序包（v0.2.0）于2018-10-11创建。

Answer 1

从rsample 0.0.2开始，使用该库执行这种拆分的唯一记录方法似乎是group_vfold_cv函数，例如：

resamples <- group_vfold_cv(dframe, group='id', v=3)
lapply(resamples$splits, training)
lapply(resamples$splits, testing)

反复进行训练/测试拆分

1 个答案: