如果我有df:
> ID<-c("A","A","A","B","B","B","B","C","C","C","C")
> attr<-c("yes1","yes1","no","yes2","yes1","yes1","yes1","no","no","yes1","yes2")
> df = data.frame(ID, attr) ; df
ID attr
1 A yes1
2 A yes1
3 A no
4 B yes2
5 B yes1
6 B yes1
7 B yes1
8 C no
9 C no
10 C yes1
11 C yes2
拥有数千个ID。我想添加另一个列,输出每个ID的"yes"
个属性的百分比,以及是否只有一个"no"
attr:
ID %yes #no
1 A 66.7 1
2 B 100 0
3 C 50 2
有没有办法整合行,类似于SQL GROUP BY
?最终,这个新的df会对ID进行分类并添加到原始df中:
ID attr result
1 A yes1 Pos
2 A yes1 Pos
3 A no False
4 B yes2 TruePos
5 B yes1 TruePos
6 B yes1 TruePos
7 B yes1 TruePos
8 C no False
9 C no False
10 C yes1 Pos
11 C yes2 Pos
答案 0 :(得分:3)
查看data.table
包:
加载包并将data.frame
转换为data.table
。使用key=
指定分组列。
library(data.table)
DT <- data.table(df, key="ID")
执行汇总。
DT2 <- DT[, list(pct = length(grep("yes", attr))/length(attr),
no = sum(attr == "no")), by=key(DT)]
DT2
# ID pct no
# 1: A 0.6666667 1
# 2: B 1.0000000 0
# 3: C 0.5000000 2
答案 1 :(得分:2)
这将为您提供每个ID级别“是”的比例:
by(substr(df$attr,1,3)=="yes",INDICES=df$ID,FUN=mean)
这将告诉您每个ID级别的“no”条目数:
by(df$attr=="no",INDICES=df$ID,FUN=sum)