library(tidyverse)
library(purrr)
使用下面的示例数据,我可以创建以下功能:
Funs <- function(DF, One, Two){
One <- enquo(One)
Two <- enquo(Two)
DF %>% filter(School == (!!One) & Code == (!!Two)) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1))
}
然后我可以使用该函数来过滤两个变量 - 学校和代码 - 像这样:
Funs(DF, "School1", "B344")
这一切都很好,但我的实际数据有很多变量,因此我不想不断地在函数中输入“School”和“Code”变量,我想使用tidyverse和purrr包来循环遍历两个列表(学校之一,代码之一)并将其提供给过滤器。我希望输出结果列表。
为了简单起见,输入dplyr :: filter的两个列表每个只有两个值:School2将使用S300,School1将使用B344,就像上面的示例一样。
我尝试过的一些例子:
map2(c(“School2”, ”School1”),
c(“S300”, ”B344”),
function(x,y) {
DF %>% filter(School == .x & Code == .y) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1))
}
也...
map2(c("School2", "School1")),
c("S300","B344"),
~filter(School == .x & Code == .y) %>%
group_by(Code, School)%>%
summarise(Count = sum(Question1))
这就是......
list(c("School2", "School1"), c("S300", "B344")) %>%
map2( ~ filter(School == .x & Code == .y) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1)))
这些似乎都不起作用,所以请帮助我们!
示例数据:
Code <- c("B344","B555","S300","T220","B888","B888","B555","B344","B344","T220","B555","B555","S300","B555","S300","S300","S300","S300","B344","B344","B888","B888","B888")
School <- c("School1","School1","School2","School3","School4","School4","School1","School1","School3","School3","School4","School1","School1","School3","School2","School2","School4","School2","School3","School4","School3","School1","School2")
Question1 <- c(3,4,5,4,5,5,5,4,5,3,4,5,4,5,4,3,3,3,4,5,4,3,3)
Question2 <- c(5,4,3,4,3,5,4,3,2,3,4,5,4,5,4,3,4,4,5,4,3,3,4)
DF <- data_frame(Code, School, Question1, Question2)
答案 0 :(得分:1)
以下是一些选项,从大多数代码到最佳代码:
library(tidyverse)
DF <- data_frame(Code = c("B344", "B555", "S300", "T220", "B888", "B888", "B555", "B344", "B344", "T220", "B555", "B555", "S300", "B555", "S300", "S300", "S300", "S300", "B344", "B344", "B888", "B888", "B888"),
School = c("School1", "School1", "School2", "School3", "School4", "School4", "School1", "School1", "School3", "School3", "School4", "School1", "School1", "School3", "School2", "School2", "School4", "School2", "School3", "School4", "School3", "School1", "School2"),
Question1 = c(3, 4, 5, 4, 5, 5, 5, 4, 5, 3, 4, 5, 4, 5, 4, 3, 3, 3, 4, 5, 4, 3, 3),
Question2 = c(5, 4, 3, 4, 3, 5, 4, 3, 2, 3, 4, 5, 4, 5, 4, 3, 4, 4, 5, 4, 3, 3, 4))
wanted <- data_frame(School = c("School2", "School1"),
Code = c("S300", "B344"))
要使map2
正常工作,如果使用代字符表示法,则变量名为.x
和.y
;如果你使用常规函数表示法,你可以随意调用它们。不要忘记filter
的第一个参数是管道输入的数据框,所以:
map2_dfr(wanted$School, wanted$Code, ~filter(DF, School == .x, Code == .y)) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
由于我将wanted
设置为数据框(香草列表也可以使用),因此您可以使用pmap
。对于两个变量,带有pmap
的参数名称实际上可能与map2
相同,但它实际上是一个带有...
参数的函数,因此以不同方式处理它们通常是有意义的,例如使用..1
表示法:
wanted %>%
pmap_dfr(~filter(DF, School == ..1, Code == ..2)) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
上述两种技术的问题在于,它们会很慢,因为它们对filter
的每一行都运行wanted
,这意味着您要多次重新测试每一行。为了使代码保持相似,避免额外工作的一种稍微麻烦的方法是将列组合成一个,例如,与tidyr::unite
:
DF %>%
unite(school_code, School, Code) %>%
filter(school_code %in% invoke(paste, wanted, sep = '_')) %>% # or paste(wanted$School, wanted$Code, sep = '_') or equivalent
separate(school_code, c('School', 'Code')) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
...或者只是将它们组合在filter
内:
DF %>%
filter(paste(School, Code) %in% paste(wanted$School, wanted$Code)) %>% # or invoke(paste, wanted)
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
最佳获得所需结果的方式可能更明显,因为我已将wanted
设置为数据框:一个连接,旨在完成此工作:
DF %>%
inner_join(wanted) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> Joining, by = c("Code", "School")
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0