Question

我想使用dplyr

计算数据框中的唯一组合

我尝试了以下内容：

require(dplyr)

set.seed(314)
dat <- data.frame(a = sample(1:3, 100, replace = T),
                  b = sample(1:2, 100, replace = T),
                  c = sample(1:2, 100, replace = T))

dat %>% group_by(a,b,c) %>% summarise(n = n())

但为了使这个通用（与列的名称无关），我尝试了：

dat %>% group_by(everything()) %>% summarise(n = n())

结果是：

    a     b     c     n
<int> <int> <int> <int>
1      1     1     1     6
2      1     1     2     8
3      1     2     1    13
4      1     2     2     8
5      2     1     1     7
6      2     1     2    12
7      2     2     1    14
8      2     2     2    10
9      3     1     1     3
10     3     1     2     4
11     3     2     1     7
12     3     2     2     8

哪个给出了错误

Error in mutate_impl(.data, dots) : `c(...)` must be a character vector

我摆弄了不同的东西，却无法让它发挥作用。我知道我可以使用names(dat)，但数据框中需要位于group_by()的列取决于dplyr链中的先前步骤。

Answer 1

有一个名为group_by_all()的函数（在同一意义上group_by_at和group_by_if）就是这样做的。

library(dplyr)

dat %>% 
 group_by_all() %>% 
 summarise(n = n())

给出相同的结果，

# A tibble: 12 x 4
# Groups:   a, b [?]
       a     b     c     n
   <int> <int> <int> <int>
 1     1     1     1     6
 2     1     1     2     8
 3     1     2     1    13
 4     1     2     2     8
 5     2     1     1     7
 6     2     1     2    12
 7     2     2     1    14
 8     2     2     2    10
 9     3     1     1     3
10     3     1     2     4
11     3     2     1     7
12     3     2     2     8

PS

packageVersion('dplyr')
#[1] ‘0.7.2’

Answer 2

我们可以使用.dots

dat %>%
     group_by(.dots = names(.)) %>%
     summarise(n = n())
# A tibble: 12 x 4
# Groups:   a, b [?]
#      a     b     c     n
#   <int> <int> <int> <int>
#1     1     1     1     6
#2     1     1     2     8
#3     1     2     1    13
#4     1     2     2     8
#5     2     1     1     7
#6     2     1     2    12
#7     2     2     1    14
#8     2     2     2    10
#9     3     1     1     3
#10    3     1     2     4
#11    3     2     1     7
#12    3     2     2     8

另一种选择是使用非引用，sym方法

dat %>%
    group_by(!!! rlang::syms(names(.))) %>%
    summarise(n = n())

如何group_by（everything（））

2 个答案: