我已将文件作为R:
中的事务加载path = "my_file.csv"
t = read.transactions(path,format="single", sep=';',cols=c("ID","Products"))
#get the rules:
rules = apriori(t,parameter = list(supp=0.01, conf=0.33, minlen=2, maxlen=4))
#sort by confidence:
rules = sort(rules, by="confidence", decreasing=TRUE)
#inspect the first 10 rules:
inspect(rules[1:10])
哪个输出是:
lhs rhs support confidence lift
[1] {e,b} => {a} 0.01 0.97 some_value
[2] {a} => {f} 0.04 0.92 some_value
[3] {t,f} => {a} 0.12 0.91 some_value
[4] {b,j} => {a} 0.09 0.82 some_value
[5] {e} => {a} 0.25 0.77 some_value
[6] {g,h} => {a} 0.05 0.56 some_value
[7] {p} => {a} 0.31 0.54 some_value
[8] {q,n} => {h} 0.18 0.49 some_value
[9] {s} => {a} 0.07 0.46 some_value
[10] {s,d} => {a} 0.20 0.42 some_value
现在我的问题是项目集{a}太频繁了,我想设置apriori规则生成器,使项目{a}或任何其他我不想考虑的项目,不会出现在生成的规则中。 我知道一种简单的方法是从上传的交易文件中删除项目{a};无论如何,即使它很简单,它也不聪明和优雅,也很长,因为我正在使用数百种不同的交易文件。
在网上搜索我发现这个设置模式用于指定lhs和rhs:
rules = apriori(t,parameter = list(supp=0.01, conf=0.33, minlen=2, maxlen=4), appearance=list(default="lhs", rhs="b"))
现在检查的输出是:
lhs rhs support confidence lift
[1] {a,b} => {b} other_value other_value other_value
[2] {a} => {b} other_value other_value other_value
[3] {a,f} => {b} other_value other_value other_value
[4] {b,j} => {b} other_value other_value other_value
[5] {a} => {b} other_value other_value other_value
[6] {a,h} => {b} other_value other_value other_value
[7] {a} => {b} other_value other_value other_value
[8] {q,a} => {b} other_value other_value other_value
[9] {a} => {b} other_value other_value other_value
[10] {a,d} => {b} other_value other_value other_value
所以有可能告诉Apriori我们想要rhs(或lhs)中的哪个项目;但是不可能告诉Apriori我们不想要哪个项目。或者我不可能以这种方式告诉我(我不想要{a}):
rules = apriori(t,parameter = list(supp=0.01, conf=0.33, minlen=2, maxlen=4), appearance=list(default="lhs", rhs!="a"))
这会产生错误。
有什么建议吗?感谢
答案 0 :(得分:3)
查看? APappearance
第一个示例显示如何从项目集中排除单个项目。您也可以为挖掘规则执行此操作:
data("Adult")
## find only frequent itemsets which do not contain small or large income
is <- apriori(Adult, parameter = list(support= 0.1, target="frequent"),
appearance = list(none = c("income=small", "income=large"),
default="both"))