我有一个数据帧列表:
df1 <- data.frame(one = c('red','blue','green','red','red','blue','green','green'),
one.1 = as.numeric(c('1','1','0','1','1','0','0','0')))
df2 <- data.frame(two = c('red','yellow','green','yellow','green','blue','blue','red'),
two.2 = as.numeric(c('0','1','1','0','0','0','1','1')))
df3 <- data.frame(three = c('yellow','yellow','green','green','green','white','blue','white'),
three.3 = as.numeric(c('1','0','0','1','1','0','0','1')))
all <- list(df1,df2,df3)
我需要按第一列对每个数据帧进行分组,并对第二列进行汇总。 我个人会做这样的事情:
library(dplyr)
df1 <- df1 %>%
group_by(one) %>%
summarise(sum = sum(one.1))
但是我在弄清楚如何遍历列表中的每个项目时遇到了麻烦。
我已经考虑过使用循环:
for(i in 1:3){
all[i] <- all[i] %>%
group_by_at(1) %>%
summarise()
}
但是我无法弄清楚如何在summarise()函数中指定要求和的列(无论如何,此循环都可能是错误的)。
理想情况下,我需要将输出作为另一个列表,每个项目都是汇总数据,如下所示:
[[1]]
# A tibble: 3 x 2
one sum
<fct> <dbl>
1 blue 1
2 green 0
3 red 3
[[2]]
# A tibble: 4 x 2
two sum
<fct> <dbl>
1 blue 1
2 green 1
3 red 1
4 yellow 1
[[3]]
# A tibble: 4 x 2
three sum
<fct> <dbl>
1 blue 0
2 green 2
3 white 1
4 yellow 1
非常感谢您的帮助!
答案 0 :(得分:2)
使用purrr::map
助手使用\\.
并在列中进行汇总,其中包含字母点matches
。
library(dplyr)
library(purrr)
map(all, ~.x %>%
#group_by_at(vars(matches('one$|two$|three$'))) %>% #column ends with one, two, or three
group_by_at(1) %>%
summarise_at(vars(matches('\\.')),sum))
#summarise_at(vars(matches('\\.')),list(sum=~sum))) #2nd option
[[1]]
# A tibble: 3 x 2
one one.1
<fct> <dbl>
1 blue 1
2 green 0
3 red 3
[[2]]
# A tibble: 4 x 2
two two.2
<fct> <dbl>
1 blue 1
2 green 1
3 red 1
4 yellow 1
[[3]]
# A tibble: 4 x 2
three three.3
<fct> <dbl>
1 blue 0
2 green 2
3 white 1
4 yellow 1
答案 1 :(得分:2)
这是基本的R解决方案:
lapply(all, function(DF) aggregate(list(added = DF[, 2]), by = DF[, 1, drop = F], FUN = sum))
[[1]]
one added
1 blue 1
2 green 0
3 red 3
[[2]]
two added
1 blue 1
2 green 1
3 red 1
4 yellow 1
[[3]]
three added
1 blue 0
2 green 2
3 white 1
4 yellow 1
另一种方法是将列表绑定到一个列表中。在这里,我使用data.table
并避免使用名称。唯一的问题是,这可能会弄乱因素,但是我不确定这是否是您的问题。
library(data.table)
rbindlist(all, use.names = F, idcol = 'id'
)[, .(added = sum(one.1)), by = .(id, color = one)]
id color added
1: 1 red 3
2: 1 blue 1
3: 1 green 0
4: 2 red 1
5: 2 yellow 1
6: 2 green 1
7: 2 blue 1
8: 3 yellow 1
9: 3 green 2
10: 3 white 1
11: 3 blue 0