R:如何对按因子分组的列进行求和?

时间:2015-09-11 19:29:31

标签: r data-cleaning

如果我有这样的表:

user,v1,v2,v3
a,1,0,0
a,1,0,1
b,1,0,0
b,2,0,3
c,1,1,1

如何将其变成这个?

user,v1,v2,v3
a,2,0,1
b,3,0,3
c,1,1,1

2 个答案:

答案 0 :(得分:3)

在基地R,

D <- matrix(c(1, 0, 0,
              1, 0, 1,
              1, 0, 0,
              2, 0, 3,
              1, 1, 1),
            ncol=3, byrow=TRUE, dimnames=list(1:5, c("v1", "v2", "v3")))
D <- data.frame(user=c("a", "a", "b", "b", "c"), D)
aggregate(. ~ user, D, sum)

返回

> aggregate(. ~ user, D, sum)
  user v1 v2 v3
1    a  2  0  1
2    b  3  0  3
3    c  1  1  1

答案 1 :(得分:2)

您可以使用dplyr

library(dplyr)
df = data.frame(
  user = c("a", "a", "b", "b", "c"),
  v1   = c(1, 1, 1, 2, 1),
  v2   = c(0, 0, 0, 0, 1),
  v3   = c(0, 1, 0, 3, 1))

group_by(df, user) %>% 
summarize(v1_sum = sum(v1),
          v2_sum = sum(v2),
          v3_sum = sum(v3))      

如果您不熟悉%>%符号,它基本上就像来自bash的管道。它接受group_by()的输出并将其放入summarize()。以这种方式完成同样的事情:

by_user = group_by(df, user)
df_summarized = summarize(by_user, 
                          v1_sum = sum(v1),
                          v2_sum = sum(v2),
                          v3_sum = sum(v3))