假设我有以下三列数据框。
data = data.frame(id=c(1:10), interest_1=c("food","","","drugs","beer","soda","","","drugs","sports"),
interest_2=c("fruits","car","jeans","","","","soda","shoes","","drugs"),
interest_3=c("","","","","soda","sports","","","",""))
data
我想得到每一行的数量。
以下事件,其中食物是兴趣_1,水果是兴趣_2,没有兴趣_3仅发生一次。
id interest_1 interest_2 interest_3
1 1 food fruits
以下事件,其中药物是兴趣_1,没有兴趣_2或者兴趣_3出现两次。
id interest_1 interest_2 interest_3
4 drugs
9 drugs
我想计算每次发生的次数。我该怎么做呢?
输出应如下所示:
interest_1 interest_2 interest_3 count
food fruits 1
car 1
jeans 1
drugs 2
答案 0 :(得分:6)
> aggregate(id~.,data,length)
interest_1 interest_2 interest_3 id
1 drugs 2
2 car 1
3 sports drugs 1
4 food fruits 1
5 jeans 1
6 shoes 1
7 soda 1
8 beer soda 1
9 soda sports 1
基本上,这意味着:将函数length
应用于由其他列的每个组合的id
值组成的向量。
答案 1 :(得分:2)
require(plyr)
ddply(data, .(interest_1, interest_2, interest_3), c("nrow"))