pyspark ml推荐包包括一个基于Hu,Koren和Volinsky:http://yifanhu.net/PUB/cf.pdf的论文的ALS实现,用于隐式反馈数据集。
https://spark.apache.org/docs/2.3.0/ml-collaborative-filtering.html https://spark.apache.org/docs/2.3.1/api/python/_modules/pyspark/mllib/recommendation.html
spark使用哪种精确形式从观测矩阵和控制置信度强度的参数alpha生成置信度矩阵?对于r_ui的所有值(零和非零)或其他值,spark是否使用1 + alpha * r_ui?
假设在第1683行给出了实现: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
我们有:
if (implicitPrefs) {
// Extension to the original paper to handle rating < 0. confidence is a function
// of |rating| instead so that it is never negative. c1 is confidence - 1.
val c1 = alpha * math.abs(rating)
// For rating <= 0, the corresponding preference is 0. So the second argument of add
// is only there for rating > 0.
if (rating > 0.0) {
numExplicits += 1
}
ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)
} else {
ls.add(srcFactor, rating)
numExplicits += 1
}
为什么ls.add()的第三个参数作为c1而不是1 + c1传递? 在第850行,查看ls.add()如何在NormalEquation类上工作:
def add(a: Array[Float], b: Double, c: Double = 1.0): this.type = {
require(c >= 0.0)
require(a.length == k)
copyToDouble(a)
blas.dspr(upper, k, c, da, 1, ata)
if (b != 0.0) {
blas.daxpy(k, b, da, 1, atb, 1)
}
this
}
为什么使用c1 = alpha * | rating |有意义?而不是blas.dspr()调用中的1 + c1?