所以我正在使用Spark 1.0.0构建一个隐式反馈推荐模型,我试图按照他们在协作过滤页面上的示例: http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback
我甚至加载了他们在示例中引用的测试数据集: http://codesearch.ruethschilling.info/xref/apache-foundation/spark/mllib/data/als/test.data
但是,当我尝试运行隐式反馈模型时: val alpha = 0.01 val model = ALS.trainImplicit(rating,rank,numIterations,alpha)
(评级完全来自他们的数据集,排名= 10,numIterations = 20)我收到以下错误:
scala> val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)
<console>:26: error: overloaded method value trainImplicit with alternatives:
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double,seed: Long)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating], Int, Int, Double)
val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)
有趣的是,这个模型在不做trainImplicit(即ALS.train)时运行得很好
答案 0 :(得分:4)
该示例似乎与实现不同步,因为trainImplicit
没有带有四个参数的重载 - 这是错误消息告诉您的内容。但是,如果您查看Scala source code for ALS,您会看到三个参数重载是通过一些“幻数”来实现的六个参数重载:
def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int)
: MatrixFactorizationModel = {
trainImplicit(ratings, rank, iterations, 0.01, -1, 1.0)
}
这表明0.01是lambda的一个合适的默认值。 (或许对与对ML有更深入了解的人进行核实。)这可能会给你足够的信息来合理调用五或六个参数过载。 (当然,如果你足够了解更好的价值观,那就太棒了!)
例如:
val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, alpha)
或
val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, -1, alpha)
最后,你可能没有意识到有相当不错的API documentaiton for ALS。