我们怎样才能找到支持并对规则有先见之明?

时间:2017-04-24 12:31:14

标签: r apriori arules

我在交易数据中做项目关联。我在R中使用arules包来构建规则。 我正在使用此链接https://1drv.ms/u/s!Ak1rt2E1f2gFgV9t7hMVAn0P4gd0

分享我的示例数据
library(arules)
library(arulesViz)
df = read.csv("trans.csv")
trans = as(split(df[,"Item"], df[,"Billno"]), "transactions")
inspect(trans[1:20])
summary(trans)
rules1 = apriori(trans,parameter = list(support = 0.6, confidence = 0.6, 
target = "rules"))
summary(rules1) ##Output is "Set of 0 rules"

我的输出为,

Summary(rules1)
  

一套0规则

我在发布此链接之前提到了https://stats.stackexchange.com/questions/56034/association-analysis-returns-0-useful-rules这个链接。我也试过随机数来获得支持和信心,没有任何作用。

1 个答案:

答案 0 :(得分:4)

找到正确的最小支持和最小置信度值并以0个频繁项目集或0个关联规则结束的问题非常普遍。如果您需要复习支持和信心的确切含义,请阅读this

让我们先看看你的交易数据:

summary(trans)
transactions as itemMatrix in sparse format with
 2531 rows (elements/itemsets/transactions) and
 6632 columns (items) and a density of 0.0005951533 

most frequent items:
AR845311 AR800369 AR828249 AR839869 AR831167  (Other) 
      84       35       31       29       24     9787 

element (itemset/transaction) length distribution:
sizes
   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
 767 509 306 238 160 112 100  52  69  50  31  27  18  12  13  15   9  10   7   5   4 
 23  24  25  27  28  32  34  36  48 
  3   4   2   3   1   1   1   1   1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   2.000   3.947   5.000  48.000 

要处理的第一个问题是最低限度的支持。摘要说明您的最常见项目(AR845311)在数据集中出现84次。您的物品通常支持率很低

summary(itemFrequency(trans))

      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
      0.0003951 0.0003951 0.0003951 0.0005952 0.0003951 0.0331900 

你使用分钟。支持0.6,但最常见的单项只有0.033的支持!你需要减少支持。如果您想查找数据中至少出现10次的项目集/规则,那么您可以将最低支持设置为:

 10/length(trans)

 [1] 0.003951008

第二个问题是您的数据非常稀疏(摘要显示密度约为0.0006)。这意味着您的交易相当短(即只包含少量项目)。

table(size(trans))

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
767 509 306 238 160 112 100  52  69  50  31  27  18  12  13  15   9  10   7   5   4 
 23  24  25  27  28  32  34  36  48 
  3   4   2   3   1   1   1   1   1 

短交易意味着规则的可信度可能很低。对于你的数据,事实证明它非常低,所以我首先使用0。

rules <- apriori(trans, 
+   parameter = list(support = 0.004, confidence = 0, target = "rules"))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen
          0    0.1    1 none FALSE            TRUE       5   0.004      1     10
 target   ext
  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 10 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6632 item(s), 2531 transaction(s)] done [0.00s].
sorting and recoding items ... [40 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [46 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> summary(rules)
set of 46 rules

rule length distribution (lhs + rhs):sizes
 1  2 
40  6 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    1.00    1.00    1.13    1.00    2.00 

summary of quality measures:
    support           confidence            lift            count      
 Min.   :0.004346   Min.   :0.004346   Min.   : 1.000   Min.   :11.00  
 1st Qu.:0.004741   1st Qu.:0.004840   1st Qu.: 1.000   1st Qu.:12.00  
 Median :0.005531   Median :0.005729   Median : 1.000   Median :14.00  
 Mean   :0.006803   Mean   :0.057301   Mean   : 3.316   Mean   :17.22  
 3rd Qu.:0.007112   3rd Qu.:0.008890   3rd Qu.: 1.000   3rd Qu.:18.00  
 Max.   :0.033188   Max.   :0.705882   Max.   :21.269   Max.   :84.00  

mining info:
  data ntransactions support confidence
 trans          2531   0.004          0

结果表明,至少有一条规则的置信度为0.7。您可以更高的信心再次运行APRIORI。以下是最值得信赖的规则:

inspect(head(rules, by = "confidence"))
    lhs           rhs        support     confidence lift     count
[1] {AR835501} => {AR845311} 0.004741209 0.7058824  21.26891 12   
[2] {AR743988} => {AR845311} 0.004346108 0.6470588  19.49650 11   
[3] {AR800369} => {AR845311} 0.007111814 0.5142857  15.49592 18   
[4] {AR845311} => {AR800369} 0.007111814 0.2142857  15.49592 18   
[5] {AR845311} => {AR835501} 0.004741209 0.1428571  21.26891 12   
[6] {AR845311} => {AR743988} 0.004346108 0.1309524  19.49650 11 

可以找到关于如何使用关联规则挖掘的完整示例here

希望这有帮助!