Hy,
我正在尝试使用FP-Growth算法使用Spark构建推荐篮分析
我有这些交易
val transactions = sc.parallelize(Seq(
Array("Tuna", "Banana", "Strawberry"),
Array("Melon", "Milk", "Bread", "Strawberry"),
Array("Melon", "Kiwi", "Bread"),
Array("Bread", "Banana", "Strawberry"),
Array("Milk", "Tuna", "Tomato"),
Array("Pepper", "Melon", "Tomato"),
Array("Milk", "Strawberry", "Kiwi"),
Array("Kiwi", "Banana", "Tuna"),
Array("Pepper", "Melon")
))
现在我想要“频繁项目”
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
val freqItemsets = transactions
.flatMap(xs =>
(xs.combinations(1) ++ xs.combinations(2)).map(x => (x.toList, 1L))
)
.reduceByKey(_ + _)
.map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}
val ar = new AssociationRules()
.setMinConfidence(0.4)
val results = ar.run(freqItemsets)
最后我使用关联规则来获取“规则”
results.collect().foreach { rule =>
println("[" + rule.antecedent.mkString(",")
+ "=>"
+ rule.consequent.mkString(",") + "]," + rule.confidence)
}
一切都好,直到现在,但接下来我想为每笔交易提供建议......有什么简单的方法可以做到这一点?因为我的scala非常糟糕
在R中,我做了类似的事情
baskets=function(x){
rulesMatchLHS = is.subset(rules@lhs,x)
suitableRules = rulesMatchLHS & !(is.subset(rules@rhs,x))
order.rules = sort(rules[suitableRules], by = "lift")
}
results = sapply(1:length(trans), function(x) baskets(trans[x]))
感谢您的时间
答案 0 :(得分:0)
嗯,生成规则后,它们看起来像这样: lhs => rhs(置信度),或更多细节,例如:
(“tuna”,“banana”)=> (“草莓”)(信心)
现在,您将从最小信心开始列出这些规则。在此之后,你想使用规则列表来预测某些篮子,即新篮子。
您将需要找到与新篮子中的项目最匹配的规则,该匹配的特定值或分数,让我们说一个新的篮子(“金枪鱼”,“香蕉”)将完全符合规则以上(与规则的左侧匹配)但如果匹配的项目较少,则分数应该较低,您可以设置最小分数以触发推荐,一旦您获得规则匹配,然后推荐右侧项目规则。
我希望这很清楚,你在你提供的代码上拥有所需的一切。