如何为用户生成个人推荐,排除他使用spark MLlib ALS算法在scala中评级的电影?

时间:2017-07-14 17:55:20

标签: scala apache-spark rdd apache-spark-mllib recommendation-engine

我目前正计划在MovieLens数据集上使用ALS算法为用户生成电影推荐一切正常但有时ALS算法返回已评级的电影,我想将它们从我当前尝试生成的推荐中排除这些建议如下。

`val moviesRatedbyUser = ratings.keyBy(_._2.user).lookup(206547)
 println("rated movies are" + moviesRatedbyUser) 
 val candidates = 
 sc.parallelize(movies.keys.filter(!moviesRatedbyUser(_)).toSeq)
 val recommendations = bestModel.get
    .predict(candidates.map((206547, _)))
    .collect()
    .sortBy(- _.rating)
    .take(10)

var i = 1
println("Movies recommended for you:")
recommendations.foreach { r =>
println("%2d".format(i) + ": " + movies(r.product))
i += 1
}`

这里我试图在返回的print语句的rdd中查找userid moviesRatedbyUser: Seq[(Long, org.apache.spark.mllib.recommendation.Rating)] = WrappedArray((3,Rating(206547,80,1.0))) 我想知道如何抓取movieid(在这种情况下为80),以便我可以将其从生成的建议中排除

1 个答案:

答案 0 :(得分:0)

如下所示,如何编写代码

val moviesForUser = ratings.keyBy(_._2.user).lookup(206547)
val ratingsformovies =  moviesForUser.toMap.values.map(elem => 
(elem.product)).toSeq // answer I wanted is this line 
val candidates = 
sc.parallelize(movies.keys.filter(!ratingsformovies.contains(_)).toSeq)
val recommendations = bestModel.get
.predict(candidates.map((206547, _)))
.collect()
.sortBy(- _.rating)
.take(10)

var i = 1
println("Movies recommended for you:")
recommendations.foreach { r =>
println("%2d".format(i) + ": " + movies(r.product))
i += 1
}