根据外部价值有条件地应用管道步骤

时间:2017-05-16 12:38:56

标签: r dplyr conditional workflow pipeline

鉴于dplyr工作流程:

require(dplyr)                                      
mtcars %>% 
    tibble::rownames_to_column(var = "model") %>% 
    filter(grepl(x = model, pattern = "Merc")) %>% 
    group_by(am) %>% 
    summarise(meanMPG = mean(mpg))

我有兴趣根据filter的值有条件地应用applyFilter

解决方案

对于applyFilter <- 1,使用"Merc"字符串过滤行,而不返回过滤器所有行。

applyFilter <- 1


mtcars %>%
  tibble::rownames_to_column(var = "model") %>%
  filter(model %in%
           if (applyFilter) {
             rownames(mtcars)[grepl(x = rownames(mtcars), pattern = "Merc")]
           } else
           {
             rownames(mtcars)
           }) %>%
  group_by(am) %>%
  summarise(meanMPG = mean(mpg))

问题

由于始终会评估ifelse调用,因此建议的解决方案效率低下;更可取的方法只会评估filter的{​​{1}}步骤。

尝试

效率低下的工作解决方案看起来像这样:

applyFilter <- 1

当然,上面的语法不正确。它只是说明理想工作流程的外观。

期望的答案

  • 我对创建一个临时对象不感兴趣;工作流程应该类似于:

    mtcars %>% 
        tibble::rownames_to_column(var = "model") %>% 
        # Only apply filter step if condition is met
        if (applyFilter) { 
            filter(grepl(x = model, pattern = "Merc"))
            }
        %>% 
        # Continue 
        group_by(am) %>% 
        summarise(meanMPG = mean(mpg))
    
  • 理想情况下,我想找到一个解决方案,我可以控制是否正在评估startingObject %>% ... conditional filter ... final object 来电

1 个答案:

答案 0 :(得分:9)

这种方法怎么样:

mtcars %>% 
    tibble::rownames_to_column(var = "model") %>% 
    filter(if(applyfilter== 1) grepl(x = model, pattern = "Merc") else TRUE) %>% 
    group_by(am) %>% 
    summarise(meanMPG = mean(mpg))

这意味着grepl仅在applyfilter为1时进行评估,否则filter只会回收TRUE

或另一种选择是使用{}

mtcars %>% 
  tibble::rownames_to_column(var = "model") %>% 
  {if(applyfilter == 1) filter(., grepl(x = model, pattern = "Merc")) else .} %>% 
  group_by(am) %>% 
  summarise(meanMPG = mean(mpg))

显然有另一种可能的方法,你只需要破坏管道,有条件地做过滤器然后继续管道(我知道OP没有要求这个,只想为其他读者提供另一个例子)

mtcars %<>% 
  tibble::rownames_to_column(var = "model")

if(applyfilter == 1) mtcars %<>% filter(grepl(x = model, pattern = "Merc"))

mtcars %>% 
  group_by(am) %>% 
  summarise(meanMPG = mean(mpg))