我有一个文本文件:MyData.txt
,其中包含itemsets
:
items
[1] {542931}
[2] {542380}
[3] {81387,542448,360015,542613,542931}
[4] {546845,542614}
[5] {1123614}
[6] {542931}
[7] {1014660}
[8] {1088953}
[9] {1138035}
我想找frequent itemsets
。这是我的代码:
tr <- read.transactions("MyData.txt",format = "basket", cols = NULL)
freq_is <- apriori(tr, parameter = list(target = "frequent itemsets", support = 0.00001))
但是当我检查freq_is
时,{542931}
的数量是两个,这是不正确的(有三个itemsets
有542931)。事实上,Apriori
只计算items [1]
和items [6]
,而忽略items [3]
。我怎么解决这个问题?
答案 0 :(得分:0)
您的问题是您在','
中使用MyData.txt
作为分隔符,但未在read.transactions()
中指定它,默认情况下会以空格分割。{1}}
因此,如果您将代码更改为:
tr <- read.transactions("MyData.txt",
format = "basket",
sep = ",",
cols = NULL)
您将看到542931的计数实际上是3:
summary(tr)
transactions as itemMatrix in sparse format with
9 rows (elements/itemsets/transactions) and
12 columns (items) and a density of 0.1296296
most frequent items:
542931 1014660 1088953 1123614 1138035 (Other)
3 1 1 1 1 7
使用apriori()
创建频繁项目集:
freq_is <- apriori(tr,
parameter = list(target = "frequent itemsets",
support = 0.00001))
如果您再检查freq_is
,则可以看到项目集{542931}
的计数为3:
inspect(freq_is)
items support count
[1] {542380} 0.1111111 1
[2] {1123614} 0.1111111 1
[3] {1014660} 0.1111111 1
[4] {1088953} 0.1111111 1
[5] {1138035} 0.1111111 1
[6] {542614} 0.1111111 1
[7] {546845} 0.1111111 1
[8] {81387} 0.1111111 1
[9] {360015} 0.1111111 1
[10] {542448} 0.1111111 1
[11] {542613} 0.1111111 1
[12] {542931} 0.3333333 3
.
.
.
[39] {81387,360015,542448,542613,542931} 0.1111111 1