Question

pyspark ml推荐包包括一个基于Hu，Koren和Volinsky：http://yifanhu.net/PUB/cf.pdf的论文的ALS实现，用于隐式反馈数据集。

https://spark.apache.org/docs/2.3.0/ml-collaborative-filtering.html https://spark.apache.org/docs/2.3.1/api/python/_modules/pyspark/mllib/recommendation.html

spark使用哪种精确形式从观测矩阵和控制置信度强度的参数alpha生成置信度矩阵？对于r_ui的所有值（零和非零）或其他值，spark是否使用1 + alpha * r_ui？

假设在第1683行给出了实现： https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

我们有：

        if (implicitPrefs) {
          // Extension to the original paper to handle rating < 0. confidence is a function
          // of |rating| instead so that it is never negative. c1 is confidence - 1.
          val c1 = alpha * math.abs(rating)
          // For rating <= 0, the corresponding preference is 0. So the second argument of add
          // is only there for rating > 0.
          if (rating > 0.0) {
            numExplicits += 1
          }
          ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)
        } else {
          ls.add(srcFactor, rating)
          numExplicits += 1
        }

为什么ls.add（）的第三个参数作为c1而不是1 + c1传递？在第850行，查看ls.add（）如何在NormalEquation类上工作：

def add(a: Array[Float], b: Double, c: Double = 1.0): this.type = {
  require(c >= 0.0)
  require(a.length == k)
  copyToDouble(a)
  blas.dspr(upper, k, c, da, 1, ata)
  if (b != 0.0) {
    blas.daxpy(k, b, da, 1, atb, 1)
  }
  this
}

为什么使用c1 = alpha * | rating |有意义？而不是blas.dspr（）调用中的1 + c1？

火花mllib ALS建议例行程序对置信度矩阵使用什么确切形式？

0 个答案: