Spark MLlib - 协同过滤隐式饲料

时间:2014-09-03 16:34:39

标签: apache-spark recommendation-engine

所以我正在使用Spark 1.0.0构建一个隐式反馈推荐模型,我试图按照他们在协作过滤页面上的示例: http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback

我甚至加载了他们在示例中引用的测试数据集: http://codesearch.ruethschilling.info/xref/apache-foundation/spark/mllib/data/als/test.data

但是,当我尝试运行隐式反馈模型时:     val alpha = 0.01     val model = ALS.trainImplicit(rating,rank,numIterations,alpha)

(评级完全来自他们的数据集,排名= 10,numIterations = 20)我收到以下错误:

scala> val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)
<console>:26: error: overloaded method value trainImplicit with alternatives:
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double,seed: Long)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating], Int, Int, Double)
val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

有趣的是,这个模型在不做trainImplicit(即ALS.train)时运行得很好

1 个答案:

答案 0 :(得分:4)

该示例似乎与实现不同步,因为trainImplicit没有带有四个参数的重载 - 这是错误消息告诉您的内容。但是,如果您查看Scala source code for ALS,您会看到三个参数重载是通过一些“幻数”来实现的六个参数重载:

def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int)
    : MatrixFactorizationModel = {
    trainImplicit(ratings, rank, iterations, 0.01, -1, 1.0)
}

这表明0.01是lambda的一个合适的默认值。 (或许对与对ML有更深入了解的人进行核实。)这可能会给你足够的信息来合理调用五或六个参数过载。 (当然,如果你足够了解更好的价值观,那就太棒了!)

例如:

val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, alpha)

val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, -1, alpha)

最后,你可能没有意识到有相当不错的API documentaiton for ALS