链接和展平PairRDD基于键或值中的公共元素

时间:2015-07-09 13:25:18

标签: scala apache-spark

我有一个RDD of Pairs如下:

(a,b)
(b,c)
(e,d)
(f,d)
(g,f)

我想找到可以根据键或值链接在一起的项目,展平它们并创建一个排序的RDD。例如之后 转换新的RDD将是:

(a,b,c) - As (a,b) and (b,c) can be linked by common element "b"
(d,e,f,g) - As (e,d) and (f,d) is linked by "d", (f,d) and (g,f) is linked by "f"

输入如下:

(a,b),(b,c),(c,d),(a,d),(c,e)

输出应为:

(a,b,c,d,e) - As all the pairs are connected by either the key or value with some other pair

使用Apache Spark了解为此实现可扩展解决方案的任何想法。

1 个答案:

答案 0 :(得分:2)