我按照https://www.r-bloggers.com/implementing-apriori-algorithm-in-r/的说明生成关联规则,但我无法在规则中生成lhs产品。我想这是因为我的交易无法分解。
以下是我的原始csv数据的示例:
itemList
1 ContentManagement
2 Migration,Explorer
3 Explorer,Migration
4 Explorer,ContentManagement
5 Migration,Explorer
然后,我应用以下内容:
#load package required
library(arules)
#convert csv file to basket format
txn = read.transactions(
file = "ItemList.csv",
rm.duplicates = TRUE,
format = "basket",
sep = ",",
col = 1
);
inspect(txn)
#remove quotes from transactions
txn@itemInfo$labels <- gsub("\"","",,txn@itemInfo$labels)
交易看起来像这样:
[1] {ContentManagement} 1
[2] {Migration,Explorer} 2
[3] {Explorer,Migration} 3
[4] {Explorer,ContentManagement} 4
[5] {Migration,Explorer} 5
当我申请以下内容时:
#run apriori algorithm
basket_rules <-
apriori(txn,
parameter = list(
minlen = 1,
sup = 0.01,
conf = 0.01,
target = "rules",
maxtime=10
))
#basket_rules <- apriori(txn,parameter = list(sup = 0.00001, conf = 0.01, target="rules"),appearance = list(lhs = "Migration")))
#view rules
inspect(basket_rules)
它给出了令人失望的结果,如下:
lhs rhs support confidence lift
[1] {} => {ContentManagement} 0.01175068 0.01175068 1
[2] {} => {Migration, Explorer} 0.01226158 0.01226158 1
[3] {} => {Explorer,Migration} 0.02145777 0.02145777 1
你可以帮忙吗?
答案 0 :(得分:1)
问题在于文件的结构。它不是逗号分隔文件,因为行号(行标签)和用逗号分隔的项之间有空格而不是数字。删除只留下项目的行号,并在col = NULL
中设置read.transactions
。
如果您使用R编写文件,请确保在row.names = FALSE
中使用write.csv
。