Question

我有一张像

这样的表格

ID     ProductBought
1       A
1       B
1       C
2       A
1       B
2       C
3       B
3       C
2       D
3       D
4       A
4       B 
4       C

我打算计算：买了A，还买了B - 2个案例（ID 1和4），总体= 2/3 ID（3个ID买了A其中2个买了B）

我知道这与关联规则/ apriori有关但我想要所有可能的产品组合的总体聚合数字/计算，下面是关于表格输出m的说明：

Category  Total distinct customer( in LHS )     % cross sell
A to B        3                                     66% 
A to C        3                                     66 % 
B to C        3                                     100 %

Answer 1

必须有更好/更清洁的方式，但这里使用dplyr：

library(dplyr)

df1 %>% 
  group_by(ProductBought) %>% 
  mutate(distinctCustomerN = n_distinct(ID)) %>% 
  ungroup() %>% 
  left_join(df1, by = "ID") %>% 
  filter(ProductBought.x != ProductBought.y) %>% 
  group_by(ProductBought.x, ProductBought.y, distinctCustomerN) %>% 
  summarise(n = n_distinct(ID)) %>% 
  mutate(n_pc = n/distinctCustomerN * 100)

#    ProductBought.x ProductBought.y distinctCustomerN     n      n_pc
#             <fctr>          <fctr>             <int> <int>     <dbl>
# 1                A               B                 3     2  66.66667
# 2                A               C                 3     3 100.00000
# 3                A               D                 3     1  33.33333
# 4                B               A                 3     2  66.66667
# 5                B               C                 3     3 100.00000
# 6                B               D                 3     1  33.33333
# 7                C               A                 4     3  75.00000
# 8                C               B                 4     3  75.00000
# 9                C               D                 4     2  50.00000
# 10               D               A                 2     1  50.00000
# 11               D               B                 2     1  50.00000
# 12               D               C                 2     2 100.00000

产品之间的关联或交叉销售％

1 个答案: