火花mllib ALS建议例行程序对置信度矩阵使用什么确切形式?

时间:2018-12-23 15:35:59

标签: apache-spark apache-spark-mllib recommendation-engine

pyspark ml推荐包包括一个基于Hu,Koren和Volinsky:http://yifanhu.net/PUB/cf.pdf的论文的ALS实现,用于隐式反馈数据集。

https://spark.apache.org/docs/2.3.0/ml-collaborative-filtering.html https://spark.apache.org/docs/2.3.1/api/python/_modules/pyspark/mllib/recommendation.html

spark使用哪种精确形式从观测矩阵和控制置信度强度的参数alpha生成置信度矩阵?对于r_ui的所有值(零和非零)或其他值,spark是否使用1 + alpha * r_ui?

假设在第1683行给出了实现: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala


        if (implicitPrefs) {
          // Extension to the original paper to handle rating < 0. confidence is a function
          // of |rating| instead so that it is never negative. c1 is confidence - 1.
          val c1 = alpha * math.abs(rating)
          // For rating <= 0, the corresponding preference is 0. So the second argument of add
          // is only there for rating > 0.
          if (rating > 0.0) {
            numExplicits += 1
          ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)
        } else {
          ls.add(srcFactor, rating)
          numExplicits += 1

为什么ls.add()的第三个参数作为c1而不是1 + c1传递? 在第850行,查看ls.add()如何在NormalEquation类上工作:

def add(a: Array[Float], b: Double, c: Double = 1.0): this.type = {
  require(c >= 0.0)
  require(a.length == k)
  blas.dspr(upper, k, c, da, 1, ata)
  if (b != 0.0) {
    blas.daxpy(k, b, da, 1, atb, 1)

为什么使用c1 = alpha * | rating |有意义?而不是blas.dspr()调用中的1 + c1?

0 个答案:
