如何在R中使用apriori查找频繁项集?

时间:2017-11-23 21:38:47

标签: r apriori

我有一个文本文件:MyData.txt,其中包含itemsets

     items                              
[1]  {542931}                           
[2]  {542380}                           
[3]  {81387,542448,360015,542613,542931}
[4]  {546845,542614}                    
[5]  {1123614}                          
[6]  {542931}                          
[7]  {1014660}                          
[8]  {1088953}                          
[9]  {1138035}                           

我想找frequent itemsets。这是我的代码:

tr <- read.transactions("MyData.txt",format = "basket", cols = NULL)
freq_is <- apriori(tr, parameter = list(target = "frequent itemsets", support = 0.00001))

但是当我检查freq_is时,{542931}的数量是两个,这是不正确的(有三个itemsets有542931)。事实上,Apriori只计算items [1]items [6],而忽略items [3]。我怎么解决这个问题?

1 个答案:

答案 0 :(得分:0)

您的问题是您在','中使用MyData.txt作为分隔符,但未在read.transactions()中指定它,默认情况下会以空格分割。{1}}

因此,如果您将代码更改为:

tr <- read.transactions("MyData.txt",
                        format = "basket",
                        sep = ",", 
                        cols = NULL)

您将看到542931的计数实际上是3:

summary(tr)
transactions as itemMatrix in sparse format with
 9 rows (elements/itemsets/transactions) and
 12 columns (items) and a density of 0.1296296 

most frequent items:
 542931 1014660 1088953 1123614 1138035 (Other) 
      3       1       1       1       1       7  

使用apriori()创建频繁项目集:

freq_is <- apriori(tr, 
                   parameter = list(target = "frequent itemsets",
                                    support = 0.00001))

如果您再检查freq_is,则可以看到项目集{542931}的计数为3:

inspect(freq_is)
     items                               support   count
[1]  {542380}                            0.1111111 1    
[2]  {1123614}                           0.1111111 1    
[3]  {1014660}                           0.1111111 1    
[4]  {1088953}                           0.1111111 1    
[5]  {1138035}                           0.1111111 1    
[6]  {542614}                            0.1111111 1    
[7]  {546845}                            0.1111111 1    
[8]  {81387}                             0.1111111 1    
[9]  {360015}                            0.1111111 1    
[10] {542448}                            0.1111111 1    
[11] {542613}                            0.1111111 1    
[12] {542931}                            0.3333333 3 
.
.
.
[39] {81387,360015,542448,542613,542931} 0.1111111 1