我在Azure机器学习工作室上创建了一个新实验,通过模块Execute R Script
能够从起始数据集中挖掘关联规则。在本次实验中,我使用了R版Microsoft R Open 3.2.2
Azure ML实验中使用的功能,我首先在R studio上编写并测试了它,我没有遇到任何问题。 这是我的实验结构:
这是在Azure ML上的模块中插入的代码的一部分,在R Studio上正常工作:
# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
library("arules")
library("sqldf")
x <- sqldf('select ID_Ordine, AnnoOrdine, ZonaCommerciale, Modello, SUM(Qta) as Qta
from dataset1 group by ID_Ordine, Modello order by ID_Ordine')
a_list1 <- transform(x, Modello = as.factor(Modello),
ID_Ordine = as.factor(ID_Ordine))
transactions <- as(split(x[,"Modello"], x[,"ID_Ordine"]), "transactions")
rules <- sort(apriori(transactions,
parameter = list(supp = 0.1, conf = 0.1, target = "rules",
maxlen = 5)), by="lift")
gi <- generatingItemsets(rules) #remove inverse duplicated rules
d <- which(duplicated(gi)) #remove inverse duplicated rules
rules <- rules[-d] #remove inverse duplicated rules
#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)),
label_rhs = labels(rhs(rules)),
count = quality(rules)["count"])
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("result");
如果我从代码count = quality(rules)["count"]
中排除这一行(要导入输出数据帧的语句与计数相关的列),实验正常,但是当我也导入计数列时,执行实验给我以下错误:
有人知道如何修复此错误,或者知道从Azure ML识别的arules对象中选择count列的另一种方法吗?
感谢您的任何建议
答案 0 :(得分:0)
此版本的软件包count
中的apriori()
列不会计算arules
列,因此我以这种方式计算它,使用反向公式计算支持:
#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)),
label_rhs = labels(rhs(rules)),
count = quality(rules)$support*length(transactions))
因为支持是使用以下公式计算的:
support = (number of transactions with A&B)/(number of total transactions)