Question

给定配对的RDD，如何生成具有相同密钥集的另一个RDD，并将值的笛卡尔乘积（对于每个密钥）作为新值？

这就是我的意思：

//Given
(K1, V1)
(K1, V2)
(K2, W1)
(K2, W2)

//Want 
(K1, (V1, V1))
(K1, (V1, V2))
(K1, (V2, V2))
(K2, (W1, W1))
(K2, (W1, W2))
(K2, (W2, W2))
//Note (V2, V1) and (W2, W1) are not required, but having them in the result is not a big deal either.

作为Scala和Spark的新手，我没有看到使用mapValues等内置转换的简单解决方案。我错过了一些神奇的功能吗？非常感谢。

Answer 1

只需加入自己的东西：

rdd.join(rdd)

每个键的值的笛卡尔乘积

1 个答案: