如何分析Market Basket Output?

时间:2017-03-09 16:00:40

标签: r apriori

我有以下销售数据:

+------------+------+-------+
| Receipt ID | Item | Value |
+------------+------+-------+
|          1 | a    |     2 |
|          1 | b    |     3 |
|          1 | c    |     2 |
|          1 | k    |     4 |
|          2 | a    |     2 |
|          2 | b    |     5 |
|          2 | d    |     6 |
|          2 | k    |     7 |
|          3 | a    |     8 |
|          3 | k    |     1 |
|          3 | c    |     2 |
|          3 | q    |     3 |
|          4 | k    |     4 |
|          4 | a    |     5 |
|          5 | b    |     6 |
|          5 | a    |     7 |
|          6 | a    |     8 |
|          6 | b    |     3 |
|          6 | c    |     4 |
+------------+------+-------+

使用APriori算法,我将规则修改为不同的列:

例如,我得到如下输出,我修剪支持,置信度,提升值。我只考虑将不同列映射到目标项目,项目1,项目({Item1,Item2} - > {目标项})

输出如下:

+-------------+-------+-------+
| Target Item | Item1 | Item2 |
+-------------+-------+-------+
| a           | b     |       |
| a           | b     | c     |
| a           | k     |       |
+-------------+-------+-------+

我希望计算具有规则组合的所有收据,并仅在这些收据中识别目标商品销售价值,并在组合收据中识别第1项和第2项的组合销售价值:

输出应该如下所示(我不需要下面的收据ID)

+-------------+-------+-------+--------------+----------------------+------------------------------+
| Target Item | Item1 | Item2 | Receipt ID's | Value of Target Item | Remaining value(Item1+item2) |
+-------------+-------+-------+--------------+----------------------+------------------------------+
| a           | b     |       | 1,2,5,6      | 2+2+7+8              | 3+5+6+3                      |
| a           | b     | c     | 1,6          | 2                    | (3+3) + (2+4)                |
| a           | k     |       | 1,2,3,4      | 2+2+8+5              | 4+7+1+4                      |
+-------------+-------+-------+--------------+----------------------+------------------------------+

复制Apriori:

library(arules)

Data <- data.frame(
  Receipt_ID = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,5,5,6,6,6),
  item = c('a','b','c','k','a','b','d','k','a','k','c','q','k',                    'a','b','a','a', 'b', 'c'
  )
  ,
  value = c(2,3,2,4,2,5,6,7,8,1,2,3,4,5,6,7,8,3,4
  )
)


write.table(Data,"item.csv",sep=',',row.names = F)

data_frame = read.transactions(
  file = "item.csv",
  format = "single",
  sep = ",",
  cols = c("Receipt_ID","item"),
  rm.duplicates = T
) 

rules_apriori <- apriori(data_frame)


rules_apriori


rules_tab <- as(rules_apriori, "data.frame")


rules_tab

out <- strsplit(as.character(rules_tab$rules),'=>') 
rules_tab$rhs <- do.call(rbind, out)[,2]
rules_tab$lhs <- do.call(rbind, out)[,1]
rules_tab$rhs <- gsub("\\{", "", rules_tab$rhs)
rules_tab$rhs <- gsub("}", "", rules_tab$rhs) 
rules_tab$lhs = gsub("}", "", rules_tab$lhs)
rules_tab$lhs = gsub("\\{", "", rules_tab$lhs) 

rules_final <- data.frame (target_item = character(),item_combination =     character() )

rules_final <- cbind(target_item = rules_tab$rhs,item_Combination = rules_tab$lhs)

rules_final

0 个答案:

没有答案