Question

我有一些我想总结的数据：

    studentid friend Gfriend
214  30401006      0       0
236  30401006      0       0
208  30401006      1       0
229  30401006      0       0
207  30401006      0       0
278  30401007      1       0
250  30401007      1       0
266  30401007      1       0
254  30401007      1       1
277  30401007      1       1
243  30401007      1       1

结果看起来应该是这样的：

studentid friend Gfriend
30401006   1      0
30401007   6      3

当我尝试：agg=aggregate(c(friend)~studentid,data=df,FUN=sum)时，我得到了所需的结果（但仅限于朋友变量）。但是当我尝试：agg=aggregate(c(friend,Gfriend)~studentid,data=df,FUN=sum)时，我得到了：

model.frame.default中的错误（formula = c（friend，Gfriend）~clegentid，：变量长度不同（找到'studentid'）

我检查了变量的长度（长度（var））并且它们都是相同的，加上没有NA，所以我不知道这个错误来自哪里。

为什么会这样？

Answer 1

您也可以尝试＆＃34; <＃34;

 studentid < c(30401006,30401006,30401006,30401006,30401006,30401007,
 + 30401007,30401007,30401007,30401007,30401007)
 friend <- c(0,0,1,0,0,1,1,1,1,1,1)
 Gfriend <- c(0,0,0,0,0,0,0,0,1,1,1)
 df <- data.frame(studentid,friend,Gfriend)
 df

 > result <- by(df[c(2:3)], df$studentid, FUN=colSums)

 > result
 df$studentid: 30401006
 friend Gfriend 
 1       0 
 df$studentid: 30401007
 friend Gfriend 
 6       3

Answer 2

编辑：添加na.rm = T以解决有关排除NAs的评论

查看＆＃34; plyr＆＃34;包。

library(plyr)

#split by "studentid" and sum all numeric colums 

ddply(df, .(studentid), numcolwise(sum, na.rm=T))

studentid friend Gfriend
1  30401006      1       0
2  30401007      6       3

变量长度在聚合中有不同的误差

2 个答案: