考虑以下数据框:
d <- data.frame(c1=c(rep("a",6),rep("b",6)),
c2=c("v1","v1","v2","v3","v3","v1", "v2","v3","v1","v2","v3","v2"),
c3=c(1.4,-1.2,1.5,1.6,-1.7,1.2, -1.1,-1.2,1.3,1.5,1.1,-1.9))
我想添加第4列c4,它计算c1列中“a”和“b”的正数和负数。但是,只有当c2等于“v1”时,才应考虑c3中的那些值。此外,如果只有正值或负值,则应打印空字符串
因此,对于我的示例,第4列应如下所示:
> d
c1 c2 c3 c4
1 a v1 1.4 2/1
2 a v1 -1.2 2/1
3 a v2 1.5 2/1
4 a v3 1.6 2/1
5 a v3 -1.7 2/1
6 a v1 1.2 2/1
7 b v2 -1.1 " "
8 b v3 -1.2 " "
9 b v1 1.3 " "
10 b v2 1.5 " "
11 b v3 1.1 " "
12 b v2 -1.9 " "
表示值为2/1,因为有两个正数和一个负数,其中c2 =“v1”
目前我最接近使用聚合函数,但我仍然没有真正做到正确。不确定是否有更好的功能用于此类问题?
答案 0 :(得分:3)
如果你想使用普通的R-base aggregate
应该是你的朋友:
ag <- aggregate.data.frame(
d$c3,
by = list(d$c1, d$c2),
FUN = function(x){ paste(sum(x < 0), sum(x>0), sep="/") }
)
> ag
Group.1 Group.2 x
1 a v1 1/2
2 b v1 0/1
3 a v2 0/1
4 b v2 2/1
5 a v3 1/1
6 b v3 1/1
然后您可以将汇总数据merge
放入原始data.frame:
d <- merge( d, ag, by.x = c( "c1", "c2" ), by.y = c( "Group.1", "Group.2" ), all.x = TRUE )
但是,由于其简单性,我建议使用ddply
包中的plyr
:
library("plyr")
d <- ddply( d, c("c1","c2"), function(x) {
x$c4 <- paste(sum( x$c3 < 0), sum(x$c3 > 0), sep="/")
return(x)
})
修改强>
在重新阅读问题之后,使用aggregate
:
d.sub <- d[ d$c2 == "v1", , drop=FALSE ]
ag <- aggregate(
d.sub$c3,
by = list(d.sub$c1),
FUN = function(x){ # taken from @flodel
pos <- sum(x < 0);
neg <- sum( x > 0 );
ifelse( pos * neg == 0, "", paste( pos, neg, sep="/") )
}
)
d <- merge( d, ag, by.x = "c1", by.y = "Group.1", all.x = TRUE )
关于ddply
@ flodel的解决方案,我也是这样做的。
答案 1 :(得分:3)
对于使用多个列的任何内容(除了您分组的列之外),我发现plyr
更方便:
ddply(d, "c1", transform,
c4 = { pos <- sum(c2 == "v1" & c3 >= 0)
neg <- sum(c2 == "v1" & c3 < 0)
ifelse(pos * neg == 0, ' ', paste(pos, neg, sep = '/')) })
# c1 c2 c3 c4
# 1 a v1 1.4 2/1
# 2 a v1 -1.2 2/1
# 3 a v2 1.5 2/1
# 4 a v3 1.6 2/1
# 5 a v3 -1.7 2/1
# 6 a v1 1.2 2/1
# 7 b v2 -1.1
# 8 b v3 -1.2
# 9 b v1 1.3
# 10 b v2 1.5
# 11 b v3 1.1
# 12 b v2 -1.9
答案 2 :(得分:1)
这是ddply
使用稍微不同的方法的另一种解决方案:
library(plyr)
ddply(d, .(c1), transform, c4 = {
tab <- table(factor(sign(c3[c2 == "v1"]), c(1, -1)));
ifelse(any(tab == 0), " ", paste(tab, collapse = "/")) })
# c1 c2 c3 c4
# 1 a v1 1.4 2/1
# 2 a v1 -1.2 2/1
# 3 a v2 1.5 2/1
# 4 a v3 1.6 2/1
# 5 a v3 -1.7 2/1
# 6 a v1 1.2 2/1
# 7 b v2 -1.1
# 8 b v3 -1.2
# 9 b v1 1.3
# 10 b v2 1.5
# 11 b v3 1.1
# 12 b v2 -1.9