产品之间的关联或交叉销售%

时间:2017-04-10 07:13:04

标签: r apriori

我有一张像

这样的表格
ID     ProductBought
1       A
1       B
1       C
2       A
1       B
2       C
3       B
3       C
2       D
3       D
4       A
4       B 
4       C

我打算计算: 买了A,还买了B - 2个案例(ID 1和4),总体= 2/3 ID(3个ID买了A其中2个买了B)

我知道这与关联规则/ apriori有关但我想要所有可能的产品组合的总体聚合数字/计算,下面是关于表格输出m的说明:

Category  Total distinct customer( in LHS )     % cross sell
A to B        3                                     66% 
A to C        3                                     66 % 
B to C        3                                     100 %

1 个答案:

答案 0 :(得分:1)

必须有更好/更清洁的方式,但这里使用dplyr:

library(dplyr)

df1 %>% 
  group_by(ProductBought) %>% 
  mutate(distinctCustomerN = n_distinct(ID)) %>% 
  ungroup() %>% 
  left_join(df1, by = "ID") %>% 
  filter(ProductBought.x != ProductBought.y) %>% 
  group_by(ProductBought.x, ProductBought.y, distinctCustomerN) %>% 
  summarise(n = n_distinct(ID)) %>% 
  mutate(n_pc = n/distinctCustomerN * 100)

#    ProductBought.x ProductBought.y distinctCustomerN     n      n_pc
#             <fctr>          <fctr>             <int> <int>     <dbl>
# 1                A               B                 3     2  66.66667
# 2                A               C                 3     3 100.00000
# 3                A               D                 3     1  33.33333
# 4                B               A                 3     2  66.66667
# 5                B               C                 3     3 100.00000
# 6                B               D                 3     1  33.33333
# 7                C               A                 4     3  75.00000
# 8                C               B                 4     3  75.00000
# 9                C               D                 4     2  50.00000
# 10               D               A                 2     1  50.00000
# 11               D               B                 2     1  50.00000
# 12               D               C                 2     2 100.00000