我在交易数据中做项目关联。我在R中使用arules包来构建规则。 我正在使用此链接https://1drv.ms/u/s!Ak1rt2E1f2gFgV9t7hMVAn0P4gd0
分享我的示例数据library(arules)
library(arulesViz)
df = read.csv("trans.csv")
trans = as(split(df[,"Item"], df[,"Billno"]), "transactions")
inspect(trans[1:20])
summary(trans)
rules1 = apriori(trans,parameter = list(support = 0.6, confidence = 0.6,
target = "rules"))
summary(rules1) ##Output is "Set of 0 rules"
我的输出为,
Summary(rules1)
一套0规则
我在发布此链接之前提到了https://stats.stackexchange.com/questions/56034/association-analysis-returns-0-useful-rules这个链接。我也试过随机数来获得支持和信心,没有任何作用。
答案 0 :(得分:4)
找到正确的最小支持和最小置信度值并以0个频繁项目集或0个关联规则结束的问题非常普遍。如果您需要复习支持和信心的确切含义,请阅读this。
让我们先看看你的交易数据:
summary(trans)
transactions as itemMatrix in sparse format with
2531 rows (elements/itemsets/transactions) and
6632 columns (items) and a density of 0.0005951533
most frequent items:
AR845311 AR800369 AR828249 AR839869 AR831167 (Other)
84 35 31 29 24 9787
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
767 509 306 238 160 112 100 52 69 50 31 27 18 12 13 15 9 10 7 5 4
23 24 25 27 28 32 34 36 48
3 4 2 3 1 1 1 1 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 2.000 3.947 5.000 48.000
要处理的第一个问题是最低限度的支持。摘要说明您的最常见项目(AR845311
)在数据集中出现84次。您的物品通常支持率很低
summary(itemFrequency(trans))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0003951 0.0003951 0.0003951 0.0005952 0.0003951 0.0331900
你使用分钟。支持0.6,但最常见的单项只有0.033的支持!你需要减少支持。如果您想查找数据中至少出现10次的项目集/规则,那么您可以将最低支持设置为:
10/length(trans)
[1] 0.003951008
第二个问题是您的数据非常稀疏(摘要显示密度约为0.0006)。这意味着您的交易相当短(即只包含少量项目)。
table(size(trans))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
767 509 306 238 160 112 100 52 69 50 31 27 18 12 13 15 9 10 7 5 4
23 24 25 27 28 32 34 36 48
3 4 2 3 1 1 1 1 1
短交易意味着规则的可信度可能很低。对于你的数据,事实证明它非常低,所以我首先使用0。
rules <- apriori(trans,
+ parameter = list(support = 0.004, confidence = 0, target = "rules"))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen
0 0.1 1 none FALSE TRUE 5 0.004 1 10
target ext
rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 10
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6632 item(s), 2531 transaction(s)] done [0.00s].
sorting and recoding items ... [40 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [46 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
> summary(rules)
set of 46 rules
rule length distribution (lhs + rhs):sizes
1 2
40 6
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 1.00 1.00 1.13 1.00 2.00
summary of quality measures:
support confidence lift count
Min. :0.004346 Min. :0.004346 Min. : 1.000 Min. :11.00
1st Qu.:0.004741 1st Qu.:0.004840 1st Qu.: 1.000 1st Qu.:12.00
Median :0.005531 Median :0.005729 Median : 1.000 Median :14.00
Mean :0.006803 Mean :0.057301 Mean : 3.316 Mean :17.22
3rd Qu.:0.007112 3rd Qu.:0.008890 3rd Qu.: 1.000 3rd Qu.:18.00
Max. :0.033188 Max. :0.705882 Max. :21.269 Max. :84.00
mining info:
data ntransactions support confidence
trans 2531 0.004 0
结果表明,至少有一条规则的置信度为0.7。您可以更高的信心再次运行APRIORI。以下是最值得信赖的规则:
inspect(head(rules, by = "confidence"))
lhs rhs support confidence lift count
[1] {AR835501} => {AR845311} 0.004741209 0.7058824 21.26891 12
[2] {AR743988} => {AR845311} 0.004346108 0.6470588 19.49650 11
[3] {AR800369} => {AR845311} 0.007111814 0.5142857 15.49592 18
[4] {AR845311} => {AR800369} 0.007111814 0.2142857 15.49592 18
[5] {AR845311} => {AR835501} 0.004741209 0.1428571 21.26891 12
[6] {AR845311} => {AR743988} 0.004346108 0.1309524 19.49650 11
可以找到关于如何使用关联规则挖掘的完整示例here。
希望这有帮助!