我正在尝试使用rsample
包创建嵌套的交叉验证,并且我使用purrr::map2
多次创建它们,并且折叠的数量由v
决定。参数。但是,vfold_cv
函数不接受v
参数,而是出现此错误:Error: v must be a single integer.
在下面的reprex中,我通过为每个圆柱创建交叉验证来使用mtcars
数据模拟情况。用数字代替.y
是可行的,但是我需要使用n
列来使参数随每个圆柱体而变化。
library(purrr) library(parsnip) library(rsample) library(tidyr) data("mtcars") nested <- mtcars %>% select(cyl, disp:gear) %>% group_by(cyl) %>% nest(data = disp:gear) %>% cbind(n = 2:4) nested %>% group_by(cyl) %>% mutate(cv = map2(data, n, ~nested_cv(.x, inside = vfold_cv(v = 10, repeats = 3), outside = vfold_cv(v = .y))))
错误:“ v”必须是一个整数。
答案 0 :(得分:1)
nested_cv中的vfold_cv函数,您可以尝试一下:
createNested = function(x,y){
nested_cv(x,inside = vfold_cv(v = 10, repeats = 3),outside = vfold_cv(v = y))
}
createNested(nested$data[[1]],3)
Error in vfold_splits(data = data, v = v, strata = strata, breaks = breaks) :
object 'y' not found
因此它看不到函数内部的y
变量(如您的.y)。因此,我编写了一个函数,将外部的vfold_cv()
的结果显式传递到nested_cv()
中,还有几行代码,但可以:
createNested = function(x,y){
outside_cv = vfold_cv(x,v = y)
nested_cv(x,inside = vfold_cv(v = 10, repeats = 3),outside = outside_cv)
}
nested <- mtcars %>%
select(cyl, disp:gear) %>%
nest(data = disp:gear) %>%
mutate(n=2:4)
nested %>% mutate(cv = map2(data,n,.f=createNested))
# A tibble: 3 x 4
cyl data n cv
<dbl> <list> <int> <list>
1 6 <tibble [7 × 8]> 2 <tibble [2 × 3]>
2 4 <tibble [11 × 8]> 3 <tibble [3 × 3]>
3 8 <tibble [14 × 8]> 4 <tibble [4 × 3]>
注意,一旦嵌套了数据,就不需要group_by()