如何对数据帧列表中的每个数据帧进行分组和汇总

时间:2019-07-31 10:46:19

标签: r dplyr

我有一个数据帧列表:

df1 <- data.frame(one = c('red','blue','green','red','red','blue','green','green'),
                  one.1 = as.numeric(c('1','1','0','1','1','0','0','0')))

df2 <- data.frame(two = c('red','yellow','green','yellow','green','blue','blue','red'),
                  two.2 = as.numeric(c('0','1','1','0','0','0','1','1')))

df3 <- data.frame(three = c('yellow','yellow','green','green','green','white','blue','white'),
                  three.3 = as.numeric(c('1','0','0','1','1','0','0','1')))

all <- list(df1,df2,df3)

我需要按第一列对每个数据帧进行分组,并对第二列进行汇总。 我个人会做这样的事情:

library(dplyr)

df1 <- df1 %>%
  group_by(one) %>%
  summarise(sum = sum(one.1))

但是我在弄清楚如何遍历列表中的每个项目时遇到了麻烦。

我已经考虑过使用循环:

for(i in 1:3){
      all[i] <- all[i] %>%
      group_by_at(1) %>%
      summarise()
}

但是我无法弄清楚如何在summarise()函数中指定要求和的列(无论如何,此循环都可能是错误的)。

理想情况下,我需要将输出作为另一个列表,每个项目都是汇总数据,如下所示:

[[1]]
# A tibble: 3 x 2
  one     sum
  <fct> <dbl>
1 blue      1
2 green     0
3 red       3

[[2]]
# A tibble: 4 x 2
  two      sum
  <fct>  <dbl>
1 blue       1
2 green      1
3 red        1
4 yellow     1

[[3]]
# A tibble: 4 x 2
  three    sum
  <fct>  <dbl>
1 blue       0
2 green      2
3 white      1
4 yellow     1

非常感谢您的帮助!

2 个答案:

答案 0 :(得分:2)

使用purrr::map助手使用\\.并在列中进行汇总,其中包含字母点matches

library(dplyr)
library(purrr)
map(all, ~.x %>%
    #group_by_at(vars(matches('one$|two$|three$'))) %>% #column ends with one, two, or three
    group_by_at(1) %>%
    summarise_at(vars(matches('\\.')),sum))
    #summarise_at(vars(matches('\\.')),list(sum=~sum))) #2nd option

[[1]]
# A tibble: 3 x 2
one   one.1
<fct> <dbl>
1 blue      1
2 green     0
3 red       3

[[2]]
# A tibble: 4 x 2
two    two.2
<fct>  <dbl>
1 blue       1
2 green      1
3 red        1
4 yellow     1

[[3]]
# A tibble: 4 x 2
three  three.3
<fct>    <dbl>
1 blue         0
2 green        2
3 white        1
4 yellow       1

答案 1 :(得分:2)

这是基本的R解决方案:

lapply(all, function(DF) aggregate(list(added = DF[, 2]), by = DF[, 1, drop = F], FUN = sum))

[[1]]
    one added
1  blue     1
2 green     0
3   red     3

[[2]]
     two added
1   blue     1
2  green     1
3    red     1
4 yellow     1

[[3]]
   three added
1   blue     0
2  green     2
3  white     1
4 yellow     1

另一种方法是将列表绑定到一个列表中。在这里,我使用data.table并避免使用名称。唯一的问题是,这可能会弄乱因素,但是我不确定这是否是您的问题。

library(data.table)
rbindlist(all, use.names = F, idcol = 'id'
          )[, .(added = sum(one.1)), by = .(id, color = one)]

    id  color added
 1:  1    red     3
 2:  1   blue     1
 3:  1  green     0
 4:  2    red     1
 5:  2 yellow     1
 6:  2  green     1
 7:  2   blue     1
 8:  3 yellow     1
 9:  3  green     2
10:  3  white     1
11:  3   blue     0