R grouping over combination of multiple columns

时间:2016-11-18 10:50:51

标签: r dplyr tidyverse

Consdering the input dsam as :

structure(list(a = structure(c(3L, 2L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L), 
.Label = c("A", "B", "C"), class = "factor"), b = c(1, 
1, 1, 1, 1, 3, 2, 3, 3, 1), c = structure(c(2L, 1L, 1L, 2L, 1L, 
3L, 1L, 1L, 3L, 3L), .Label = c("D", "E", "F"), class = "factor")), 
.Names = c("a", "b", "c"), row.names = c(NA, -10L), class = "data.frame")

I was trying to group over a and c and aggregate b for the groups to keep one record per group. But it seems the following code is behaving differently. The original data has over 300 columns used for grouping, so it's not an option to explicitly specify the column names, and hence using a list of the column names for grouping.

Method 1:

dsam %>% 
  group_by(a,c) %>% 
  mutate(rnk = row_number(), b = sum(b)) %>% 
  filter( rnk == max(rnk)) %>% print()

#Source: local data frame [5 x 4]
#Groups: a, c [5]
#
#       a     b      c   rnk
#  <fctr> <dbl> <fctr> <int>
#1      B     1      D     1
#2      C     2      E     2
#3      C     3      F     1
#4      A     7      D     4
#5      A     4      F     2

Method 2:

dsam %>% 
  group_by_(unlist(c("a","c"))) %>% 
  mutate(rnk = row_number(), b = sum(b)) %>% 
  filter( rnk == max(rnk)) %>% print()


#Source: local data frame [3 x 4]
#Groups: a [3]
#
#       a     b      c   rnk
#  <fctr> <dbl> <fctr> <int>
#1      B     1      D     1
#2      C     5      F     3
#3      A    11      F     6

How can I make Method 2 behave like Method 1?

p.s. Due to the large number of columns used for grouping, I would prefer not to concatenate them together. Thank you.

1 个答案:

答案 0 :(得分:0)

我们需要.dots

dsam %>% 
     group_by_(.dots = c("a", "c")) %>%
     mutate(rnk = row_number(), b = sum(b)) %>% 
     filter( rnk == max(rnk))
#      a     b      c   rnk
#  <fctr> <dbl> <fctr> <int>
#1      B     1      D     1
#2      C     2      E     2
#3      C     3      F     1
#4      A     7      D     4
#5      A     4      F     2

如果我们在没有.dots的情况下使用,它将仅按第一列分组,即&#39; a&#39;