我有一个RDD of Pairs如下:
(a,b)
(b,c)
(e,d)
(f,d)
(g,f)
我想找到可以根据键或值链接在一起的项目,展平它们并创建一个排序的RDD。例如之后 转换新的RDD将是:
(a,b,c) - As (a,b) and (b,c) can be linked by common element "b"
(d,e,f,g) - As (e,d) and (f,d) is linked by "d", (f,d) and (g,f) is linked by "f"
输入如下:
(a,b),(b,c),(c,d),(a,d),(c,e)
输出应为:
(a,b,c,d,e) - As all the pairs are connected by either the key or value with some other pair
使用Apache Spark了解为此实现可扩展解决方案的任何想法。