我有一张像
这样的表格ID ProductBought
1 A
1 B
1 C
2 A
1 B
2 C
3 B
3 C
2 D
3 D
4 A
4 B
4 C
我打算计算: 买了A,还买了B - 2个案例(ID 1和4),总体= 2/3 ID(3个ID买了A其中2个买了B)
我知道这与关联规则/ apriori有关但我想要所有可能的产品组合的总体聚合数字/计算,下面是关于表格输出m的说明:
Category Total distinct customer( in LHS ) % cross sell
A to B 3 66%
A to C 3 66 %
B to C 3 100 %
答案 0 :(得分:1)
必须有更好/更清洁的方式,但这里使用dplyr:
library(dplyr)
df1 %>%
group_by(ProductBought) %>%
mutate(distinctCustomerN = n_distinct(ID)) %>%
ungroup() %>%
left_join(df1, by = "ID") %>%
filter(ProductBought.x != ProductBought.y) %>%
group_by(ProductBought.x, ProductBought.y, distinctCustomerN) %>%
summarise(n = n_distinct(ID)) %>%
mutate(n_pc = n/distinctCustomerN * 100)
# ProductBought.x ProductBought.y distinctCustomerN n n_pc
# <fctr> <fctr> <int> <int> <dbl>
# 1 A B 3 2 66.66667
# 2 A C 3 3 100.00000
# 3 A D 3 1 33.33333
# 4 B A 3 2 66.66667
# 5 B C 3 3 100.00000
# 6 B D 3 1 33.33333
# 7 C A 4 3 75.00000
# 8 C B 4 3 75.00000
# 9 C D 4 2 50.00000
# 10 D A 2 1 50.00000
# 11 D B 2 1 50.00000
# 12 D C 2 2 100.00000