我有一个数据框,其中包含一列表示事件ID。还有另一列指示该事件中使用的产品。每个产品只能用于一次活动,每个活动至少包含一个产品。我想知道每种产品与其他产品一起使用的次数。一些样本数据如下:
set.seed(1)
events <- paste('Event ', sample(1:4, size = 15, replace = TRUE), sep = '')
events <- events[order(events)]
prods <- paste('Product ', c(1, 2, 3, 4, 1, 5, 6, 2, 4, 6, 7, 1, 2, 3, 5))
test_data <- data.frame(events, prods)
test_data
events prods
1 Event 1 Product 1
2 Event 1 Product 2
3 Event 1 Product 3
4 Event 1 Product 4
5 Event 2 Product 1
6 Event 2 Product 5
7 Event 2 Product 6
8 Event 3 Product 2
9 Event 3 Product 4
10 Event 3 Product 6
11 Event 3 Product 7
12 Event 4 Product 1
13 Event 4 Product 2
14 Event 4 Product 3
15 Event 4 Product 5
产品1和产品2在同一事件中发生两次(事件1和事件4)。所以我想要回复一个&#39; 2&#39;为那场比赛。产品1和产品7从不出现在同一事件中,所以我想要为该对返回0。对于&#39;匹配&#39;在同一项目之间,我很乐意返回产品使用的总次数。
有两种格式是可能的,我没有偏好,我希望看到它返回。
我一直在试验expand.grid
没有任何东西可以展示它。
谢谢!
答案 0 :(得分:2)
按prods
拆分events
然后计算所有combn
- inations,然后计算aggregate
以获取每个组合的计数。
out <- t(do.call(cbind,
lapply(split(as.character(test_data$prods), test_data$events), combn, 2))
)
aggregate(count ~ . , data=transform(out,count=1), FUN=sum)
# X1 X2 count
#1 Product 1 Product 2 2
#2 Product 1 Product 3 2
#3 Product 2 Product 3 2
#4 Product 1 Product 4 1
#5 Product 2 Product 4 2
#6 Product 3 Product 4 1
#7 Product 1 Product 5 2
#8 Product 2 Product 5 1
#9 Product 3 Product 5 1
#10 Product 1 Product 6 1
#11 Product 2 Product 6 1
#12 Product 4 Product 6 1
#13 Product 5 Product 6 1
#14 Product 2 Product 7 1
#15 Product 4 Product 7 1
#16 Product 6 Product 7 1
答案 1 :(得分:1)
也许这是用大锤来破解坚果,但你可以开采(频繁)项目集,其中包含其他花哨的东西。它可以像这样工作:
library(arules)
library(reshape2)
mat <- as(sapply(dcast(test_data, events~prods, fun.aggregate = length, value.var="prods")[, -1], as.logical), "transactions")
sets <- apriori(trans, parameter = list(supp = 0, conf = 0, minlen = 2, maxlen = 2, target = "frequent itemsets"))
df <- as(sets, "data.frame")
subset(transform(df, n=support*nrow(trans)), n>0, -support)
# items n
# 2 {Product 6,Product 7} 1
# 4 {Product 4,Product 7} 1
# 6 {Product 2,Product 7} 1
# 7 {Product 5,Product 6} 1
# 8 {Product 3,Product 5} 1
# 10 {Product 1,Product 5} 2
# 11 {Product 2,Product 5} 1
# 13 {Product 4,Product 6} 1
# 14 {Product 1,Product 6} 1
# 15 {Product 2,Product 6} 1
# 16 {Product 3,Product 4} 1
# 17 {Product 1,Product 3} 2
# 18 {Product 2,Product 3} 2
# 19 {Product 1,Product 4} 1
# 20 {Product 2,Product 4} 2
# 21 {Product 1,Product 2} 2
支持值显示包含两种产品的事件百分比。我将它与事务数相乘以获得频率计数。