如何从R中的apriori调用中获取常用项集的频率?

时间:2012-01-13 21:44:35

标签: r associations apriori

问题:

apriori包的arules功能从输入的交易中推断出关联规则并报告支持置信度解除每条规则的。关联规则源自频繁项集。我想在输入事务中获得最频繁的项目集。具体来说,我想获得具有给定最小支持的所有项目集。 itemset的支持是包含itemset的事务数与事务总数的比率。

要求:

  1. 我非常希望从apriori函数的中间结果中找到最常用的项集。也就是说,我宁愿不从头开始编写程序来计算最频繁的项集,因为apriori函数已经将它计算为中间步骤。尽管如此,如果确实没有合理的方式来访问apriori函数的中间结果,我会对其他解决方案持开放态度。
  2. 我宁愿不对apriori函数的结果进行字符串操作,因为这种方法过于依赖于apriori函数结果的字符串表示。同样,如果事实证明没有更好的选择,我可以采用这种方法。
  3. 我知道arules包提供的itemFrequency功能。不幸的是,此功能仅使用单个项目报告项目集。我对所有长度的项目集感兴趣,并且支持率最低。
  4. 我希望输出按数字顺序排序,然后按字典顺序排序。
  5. 示例输入:

    a,b
    a,b,c
    

    程序:

    # The following is how I'm using apriori to infer the association rules.
    library(package = "arules")
    transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",")
    rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001))
    WRITE(rules, file = "", sep = ",", quote = TRUE, col.names = NA)
    

    当前输出:

    "","rules","support","confidence","lift"
    "1","{} => {c}",0.5,0.5,1
    "2","{} => {b}",1,1,1
    "3","{} => {a}",1,1,1
    "4","{c} => {b}",0.5,1,1
    "5","{b} => {c}",0.5,0.5,1
    "6","{c} => {a}",0.5,1,1
    "7","{a} => {c}",0.5,0.5,1
    "8","{b} => {a}",1,1,1
    "9","{a} => {b}",1,1,1
    "10","{b,c} => {a}",0.5,1,1
    "11","{a,c} => {b}",0.5,1,1
    "12","{a,b} => {c}",0.5,0.5,1
    

    期望输出:

    "itemset","support"
    "{a}",1
    "{a,b}",1
    "{b}",1
    "{a,b,c}",0.5
    "{a,c}",0.5
    "{b,c}",0.5
    "{c}",0.5
    

1 个答案:

答案 0 :(得分:7)

我在arules包的引用manual中找到了generatingItemsets函数。

library(package = "arules")
transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",")
rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001))
itemsets <- unique(generatingItemsets(rules))
itemsets.df <- as(itemsets, "data.frame")
frequentItemsets <- itemsets.df[with(itemsets.df, order(-support,items)),]
names(frequentItemsets)[1] <- "itemset"
write.table(frequentItemsets, file = "", sep = ",", row.names = FALSE)