考虑下面的数据框(2列:id和val),我试图找到一种快速的方法来计算唯一事件中 val 列中元素“boo”的出现次数。 事件在 id 列中表示。独特的事件是:a,b,c,d
id<-c("a","a","a","a","b","b","c","c","c","d")
val<-c("boo","sd","ssd","df","boo","ksdj","boo","sdjhf","df","boo")
x<-data.frame(id,val)
因此,此处的结果应为4,因为所有事件中都会出现“boo”:a,b,c,d。
Count("boo") =4
例2
id<-c("a","a","a","a","b","b","c","c","c","d")
val<-c("boo","sd","ssd","df","boo","ksdj","boo","sdjhf","boo","sgfsc")
x<-data.frame(id,val)
Count("boo") =3
我需要使用R base中的软件包。
谢谢。
答案 0 :(得分:4)
对于使用基数R的boo
的唯一计数,可以做
sum(with(x, tapply(val, id, function(x) any(x == "boo"))))
## [1] 4
答案 1 :(得分:1)
试试这个:
> result <- ddply(x,~val,nrow)
> result <- result[result$V1==4, ]
> result
val V1
1 boo 4
数据框result
将包含每个val
的行数,我们可以进一步对其进行子集化以仅选择值为4的val
(表示它已发生)对于每个id
)。
这是一个稍微不太优雅的解决方案,仅使用base-R功能(您的要求):
> result <- sapply(split(x, x$val), function(x) nrow(x))
> result
boo df ksdj sd sdjhf sgfsc ssd
4 1 1 1 1 1 1
如果您想查找以特定频率发生的val
,可以像这样对result
进行分组:
> result[result >= 4]
boo
4
答案 2 :(得分:0)
# 1. subset unique rows
unique <- unique(x)
# 2. count unique rows with val == boo
sum(unique$val == "boo")