rsample vfold_cv函数不接受来自purrr :: map2的.y参数

时间:2020-06-29 14:54:22

标签: r tidyverse cross-validation purrr tidymodels

我正在尝试使用rsample包创建嵌套的交叉验证,并且我使用purrr::map2多次创建它们,并且折叠的数量由v决定。参数。但是,vfold_cv函数不接受v参数,而是出现此错误:Error: v must be a single integer.

在下面的reprex中,我通过为每个圆柱创建交叉验证来使用mtcars数据模拟情况。用数字代替.y是可行的,但是我需要使用n列来使参数随每个圆柱体而变化。

library(purrr)
library(parsnip)
library(rsample)
library(tidyr)

data("mtcars")

nested <- mtcars %>% 
    select(cyl, disp:gear) %>% 
    group_by(cyl) %>% 
    nest(data = disp:gear) %>% 
    cbind(n = 2:4)

nested %>% 
    group_by(cyl) %>% 
    mutate(cv = map2(data, n,
                     ~nested_cv(.x,
                                inside = vfold_cv(v = 10, repeats = 3),
                                outside = vfold_cv(v = .y))))

错误:“ v”必须是一个整数。

1 个答案:

答案 0 :(得分:1)

nested_cv中的vfold_cv函数,您可以尝试一下:

createNested = function(x,y){
    nested_cv(x,inside = vfold_cv(v = 10, repeats = 3),outside = vfold_cv(v = y))
}

createNested(nested$data[[1]],3)
Error in vfold_splits(data = data, v = v, strata = strata, breaks = breaks) : 
  object 'y' not found

因此它看不到函数内部的y变量(如您的.y)。因此,我编写了一个函数,将外部的vfold_cv()的结果显式传递到nested_cv()中,还有几行代码,但可以:

createNested = function(x,y){
    outside_cv = vfold_cv(x,v = y)
    nested_cv(x,inside = vfold_cv(v = 10, repeats = 3),outside = outside_cv)
}

nested <- mtcars %>% 
select(cyl, disp:gear) %>% 
nest(data = disp:gear) %>%
mutate(n=2:4)

nested %>%  mutate(cv = map2(data,n,.f=createNested))

# A tibble: 3 x 4
    cyl data                  n cv              
  <dbl> <list>            <int> <list>          
1     6 <tibble [7 × 8]>      2 <tibble [2 × 3]>
2     4 <tibble [11 × 8]>     3 <tibble [3 × 3]>
3     8 <tibble [14 × 8]>     4 <tibble [4 × 3]>

注意,一旦嵌套了数据,就不需要group_by()