正确保存/加载MatrixFactorizationModel

时间:2015-07-17 15:22:59

标签: apache-spark apache-spark-mllib

我有MatrixFactorizationModel对象。如果我在通过ALS.train(...)构建模型后立即向单个用户推荐产品,则需要300毫秒(对于我的数据和硬件)。但是,如果我将模型保存到磁盘并加载回来,那么推荐需要大约2000毫秒。 Spark警告:

15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor is not cached. Prediction could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor is not cached. Prediction could be slow.

如何在加载模型后创建/设置分区程序并缓存用户和产品因素?以下方法没有帮助:

model.userFeatures().cache();
model.productFeatures().cache();

此外,我试图重新分配这些rdds并从重新分区版本创建新模型,但这也没有帮助。

1 个答案:

答案 0 :(得分:2)

您不必使用括号,userFeatures是(Int,Array [Double])的RDD,它不带参数。

这将对您有所帮助:

model.userFeatures.cache
model.productFeatures.cache