更新

Question

我一直在尝试为spark中的某些选定用户生成推荐内容。这是通过使用每个乘积因子（n个浮点数的向量）生成用户因子（n个浮点数的向量），然后按顺序排序来完成的。

所以，假设我的客户因素为(customerId, Array[Float])，我的产品系数为(productId, Array[Float])。我必须为每个客户创建每个产品的分数，并生成(customerId, productId, score)，其中保留每个客户的前N个结果。所以我这样做：

val customers = ... // (customerId, Array[Float])
val products = ... // (productId, Array[Float])
val combination = customers.cartesian(products)
val result = combination.map(x => (combination._1._1, combination._2._1, 
    dotProd(combination._1._2, combination._2._2))

... then filter top N for each customer using dataframe

但这需要很长时间，其中一个原因是笛卡尔结果使数据量变得庞大，为每个客户重复相同的产品因素。